Written Evidence Submitted by the Alan Turing Institute
The Alan Turing Institute is the UK’s national Institute for data science and artificial intelligence (AI) and as such our submission focuses on under-representation in these fields. Following the points highlighted in the call for evidence, this submission puts forward the below key points.
The nature, causes and implications of under-representation in data science and AI:
There is a lack of diversity data specifically covering these fields, both in academia and industry. This makes it difficult to fully understand the nature of under-representation.
It is likely that under-representation corresponds to that in STEM fields with the closest association, such as computer science and software engineering.
In industry, our researchers paint a picture of structural gender inequality.
Unrepresentative teams increase the potential for harms caused by data unrepresentativeness and algorithmic bias.
What we have done to address under-representation in our community:
We describe key features of our organisational EDI strategy, such as using Equality Impact Assessments across our events, calls and research projects.
In our experience, offering a range of engagement options for doctoral students and providing financial support has improved representation of certain groups.
Action is needed to ensure diversity in the undergraduate community is translated into the postgraduate and early career research communities.
Consistent reporting in the research sector of data on protected characteristics would be beneficial.
Positive action provisions contained within the Equality Act 2010 are potentially underutilised but effective.
The Alan Turing Institute is the UK’s national institute for data science and AI. Our mission is to make great leaps in data science and AI research in order to change the world for the better.
The Institute is a partnership formed of 13 UK universities - Birmingham, Bristol, Cambridge, Edinburgh, Exeter, Leeds, Manchester, Newcastle, Oxford, Queen Mary, Southampton, University College London and Warwick - and the Engineering and Physical Sciences Research Council (EPSRC). It is home to more than 500 researchers, a growing team of in-house research software engineers and data scientists, and a professional services team.
We convene activity across the data science and AI community through partnerships with universities and research institutes (across and beyond our 13 university partners) as well as with industry, third sector and public organisations. Research excellence is the foundation of our work and our researchers collaborate across disciplines to generate impact, both through theoretical development and application to real-world problems.
We agree that high-quality research and innovation in STEM requires diversity. This is no less true in our data science and AI community, which is composed of individuals from across STEM disciplines. We therefore want to aid the Select Committee’s work in addressing underrepresentation in STEM as doing so is crucial to achieving our mission to make great leaps in data science and AI research in order to change the world for the better.
Further, we recognise that diversity plays an additional role specific to data science and AI research. Diverse research teams can better challenge existing biases in data, check bias where it risks seeping into algorithms and thus produce outputs that are less likely to perpetuate societal biases and inequalities. Improving representation will help us to develop data science and AI technologies that are of benefit to all as we address some of the biggest challenges facing science, society and the economy.
Assessing under-representation in the data science and AI academic community is difficult as there are no diversity statistics specifically covering these fields. This is in large part because it overlaps to varying degrees with a diverse range of STEM disciplines and, to a lesser extent, those not typically defined as STEM, from philosophy to law. This is true of the Turing’s research community, which is characterised by its interdisciplinarity.
However, there are certain STEM subjects with which data science and AI has a particularly close association, for example computer science, software engineering and statistics. The under-representation in these subjects is especially likely to be reflected in the under-representation in the UK’s data science and AI academic community.
This may be potential cause for concern. For instance, women are slightly under-represented among those studying science subjects (according to HESA), consisting of 45% of postgraduate (research) students. However, for those studying statistics this falls to 39% and for computer science it is just 27%. This experience is reflected from the Institute’s perspective. Women are in the minority of applicants to our doctoral Enrichment Scheme (35.5%) and are particularly under-represented among applicants to our flagship Fellowship Scheme (20.9%).
Given the above, the Institute’s view is similar to that of the Royal Society in viewing a lack of diversity data on data science and AI in academia as a barrier to addressing under-representation. It is important that universities and research institutions publish the data on their researchers in these fields, regardless of these individuals working across disciplines and being captured in other diversity data for that reason.
The Institute will be publishing diversity data on our own research community in our first EDI Annual Report later this year and will do so on an annual basis thereafter. While the Institute is in the early stages of analysing its own diversity data it would be interested in supporting the wider community with work in this area in the future.
Similarly, we think the data on those working in industry in data science and AI roles is currently insufficient. This is due, in part, to the lack of consistent definitions of these roles and that the profession is a relatively new one. It is also largely due to the reticence of large tech companies to share it. A paper from the AI Now Institute at New York University explains that this is particularly problematic given that the AI field has shifted from a primarily academic setting to being increasingly located in large corporate settings (West, Whittaker and Crawford, 2019). This clouds our view of diversity data and decision-making for large swathes of the AI workforce.
Further, when it comes to high-level overviews of the UK tech workforce, there is no specific segmentation by data science and AI fields. For instance, the APPG on Diversity and Inclusion in UK STEM’s 2020 report looked at diversity in the UK Tech workforce but did not segment the data by AI and data science fields. We consider that there is an urgent need to explore this segment of the sector.
Through the work of the Institute’s Women in Data Science and AI Project, we are able to offer a picture of gender representation in industry. The project is led by Turing Fellow Professor Judy Wajcman and Research Fellow Dr Erin Young and uses research to inform concrete policy measures aimed at increasing gender equality in data science and AI.
In 2021, the project published the report ‘Where are the Women? Mapping the gender job gap in AI’ (Young, Wacjman and Sprejer, 2021), which presented a new, curated dataset to map women’s participation in the data science and AI workforce, across industries, in the UK and other countries. This was analysed with innovative data science methodology to provide policy recommendations.
Pertinent findings on women’s participation in data science and AI jobs are summarised below:
Given the aforementioned overlap, the causes of under-representation in the UK data science and AI community reflect those in STEM generally and, particularly, those STEM fields with the closest association. For example, the pipeline for those studying and working in computer science can be influenced by unequal access to computers and the availability of computer science classes in schools, resulting in consequences for socioeconomic and ethnic diversity. This was highlighted in a US study produced by Gallup and is likely an issue in the UK, with the Royal Society finding that 54% of English schools do not offer Computer Science GCSE. When it comes to gender, existing societal biases which result in more women in STEM studying biological and biomedical sciences while more men study mathematics, engineering and computer sciences are also relevant to data science and AI.
In an industry context, the tech sector’s potentially discriminatory environment for those from under-represented groups is also a cause for concern. The State of European Tech survey noted that 59% of Black/African/Caribbean tech sector workers surveyed had experienced discrimination in the preceding 12 months. This, and issues around gendered hostility in tech workplaces (a contributory factor to the higher attrition rates among women raised previously), indicate that the talent pipeline is not the only problem affecting under-representation in data science and AI in the tech sector. On top of this, funding is directed in an unrepresentative way, with all-male teams capturing 90.8% of all capital raised in 2020.
It is worth noting that there are additional factors particular to data science and AI which contribute to underrepresentation. Again, diversity statistics specifically covering data science and AI students, researchers and workers are a barrier to understanding these factors. On the Institute’s part, and following on from the ‘Where are the Women?’ report, the Women in Data Science and AI project will build upon its research in order to fully explore the factors driving the AI gender gap. The Project and Institute will be happy to share its findings and consequent policy recommendations with the Select Committee.
Underrepresentation in data science and AI in both academia and industry has unique wider consequences. Teams of developers that lack diversity increase the likelihood that bias will leak into systems at various stages in their development and produce biased outcomes, which in turn perpetuate or exacerbate bias – creating a feedback loop (Wajcman, Young and Fitzmaurice, 2020). Given the prevalence of AI decision-making across society, these effects can occur in fields from employment and finance to health, law enforcement and more.
A paper produced for UN Women by Turing researchers Wajcman, Young and Fitzmaurice explains the two key points at which this bias can leak in. The first is in the data used to train the systems. The data selected may underrepresent certain groups or encode historical biases against minorities thanks to prior decisions on what data should be collected and how to curate it. For example, few publicly available databases of skin cancer images contain details of ethnicity or skin type (Wen, Khan, Ji Xu et al., 2022). Those that do underrepresent people with dark skin and contain no images of certain ethnic groups. This suggests that algorithms trained on these datasets will perform less accurately for relevant groups.
Second, bias can shape the modelling and analytical processes which are applied to the data and result in algorithmic bias. This is because these are influenced by the assumptions, values and priorities of the developers. Where these developer teams represent only certain groups, the preferences of those groups are more likely to be unchallenged and thus affect the interpretation of data by the systems. A recent study found racial bias in algorithms used by the US health care system (Obermeyer et al., 2019). They used health care costs as a proxy for health needs but failed to account for less money being spent on average for Black patients than White patients, despite the same level of need.
As suggested, the potential impacts of these biased systems are as diverse as the applications of AI decision-making in our society. Turing researchers have written about these in contexts from facial recognition to combatting covid-19. As well as a problem of a large magnitude, it is also an urgent one. The spread and severity of these harms is only likely to grow as the role of data science and AI in our society continues to grow.
The Institute has taken steps since its founding to embed EDI across our activities and improve representation in our community. In September 2021, this culminated in the launch of our first EDI Strategy and Action Plan to codify and cohere our EDI aspirations and plans for the next three years. We have summarised below the areas where we have begun to see impact. However, we expect it will be some time before we have a full understanding of the outcomes of the various measures and interventions being developed through our strategy.
The Institute launched its first EDI strategy and accompanying action plan in September 2021. This strategy is a statement of what we want to achieve from an EDI perspective, which importantly includes rectifying underrepresentation in our community, and the action plan is a roadmap of the specific actions we will take. Both are structured around the three roles in which the Institute will act on its EDI commitments: as an employer, as a research institute and as a national body.
We have summarised below planned activities in relation to the latter two roles and which address underrepresentation specifically in our data science and AI researchers.
As a research institute, we will:
As a national body, we will:
The Institute is home to four staff network groups, which provide staff with the opportunity to contribute to the developing EDI agenda. Importantly, they act as a ‘critical friend’ to the Institute, identifying and suggesting ways to address any barriers to the progress of under-represented groups. We are currently in the process of restricting the leadership of these groups to ensure representation from both our business team and research community to ensure a rounded perspective on the issues different sections of our community face. The groups are:
To recognise the work of our Network Group chairs they are offered an annual honoraria for their contribution, as well as recognition of their duties in performance reviews and a reduction in workload.
One area we have made changes with the aim of increasing diversity is the range of engagement options offered at the Turing and the support offered.
The data science and AI doctoral community in the UK is not representative of the undergraduate population, nor the UK population. This hinders the appointment of diverse and representative researchers. To meet our key aim to train future leaders, we established our Enrichment and Doctoral Student Schemes in 2016. The Enrichment Scheme is now in its sixth year.
Data from the 2017 -2019 Schemes indicated that 0% of Enrichment students had declared a disability. Consequently, we introduced an Enrichment Access Fund (financial award to support students with disabilities, caring responsibilities and other significant access requirements) and the option to complete a part time placement.
Following these changes to the scheme this situation has improved with up to 18% declaring a disability and successful applications to the Access Fund made in 2020 and 2021 cohorts. We still have opportunities to better support our applicants, and providing an environment where candidates feel able to provide this information is one route to ensuring we can support these individuals. We therefore consider these features of the Scheme (alongside assessment changes listed below) successful in increasing the representation of doctoral researchers with a declared disability, though more time and monitoring will help in assessing the extent of this. While we had not collected data prior to 2019 on student’s caring responsibilities, since offering flexible placements and Access Awards we have welcomed parents to every subsequent Enrichment cohort.
In 2020, we ran our first Daphne Jackson Fellowship call, offering a three-year 0.5 FTE Fellowship. Working in partnership with the Daphne Jackson Trust, this fellowship is specifically geared at supporting those who have been out of academia due to caring or health reasons to return. We plan to offer three Fellowships over the next three years with the second round now in progress.
According to the Trust, an inflexible environment in STEM research is a considerable barrier to returners. The Fellowship is therefore an important means to enable career returners, a group with a particularly high proportion of women, to re-join STEM. Retaining diverse talent is an important element of any effort to improve overall diversity as it is more likely to lead to more diverse researchers in senior positions and leadership, which in turn can support more individuals to enter STEM roles.
Since the start of the Enrichment Scheme, no awards have been made to Black students and, each year, this has been identified as an urgent area for improvement. The Institute has most recently used a targeted marketing approach as an attempt to tackle this. In 2020, the Institute marketed the scheme at 5 London universities (London Metropolitan University, University of East London, University of West London, City and Kingston University). In 2021, we partnered with BBStem, a non-profit organisation campaigning for balance and representation of Black individuals in STEM, who posted an advert of our scheme on their website. However, these efforts showed minimal success in increasing the number of Black applicants and none in increasing the number of Black awardees.
For our 2022 Enrichment cohort, recruitment will open in early January. For this cohort we are offering a workshop to support students to write strong applications and are piloting an application mentoring scheme. Mentors from the Turing will be paired for a short term relationship with a prospective applicant to support them in preparing their application. For this pilot we will be prioritising applicants from ethnic minorities currently under-represented on the Enrichment scheme and students from universities who the Turing is not partnered with. While we recognise that the limited pipeline controls much of our recruitment, working to establish role models within our community has been a powerful tool for driving diversification in academic background, and an approach we hope will be successful in this case too.
When assessing applications, the Institute strives to emulate best practice and create a transparent assessment process. Review panels at the Institute are supported by convenors who have received additional training in unconscious bias, creating inclusive panel environments and how to ensure equality in assessment processes.
We strive for diversity on decision panels both in terms of individual characteristics and the disciples and domains represented and will shortly begin reporting on this.
In our Enrichment assessment process, we have adopted principles from the Disability Confident Scheme whereby eligible students with a disability are automatically progressed to the panel stage of the application process. As detailed above, this change alongside increased flexibility and financial support saw a significant increase in the number of students with a disability offered a place.
Actions to continue to diversify the pipeline of talent into STEM in academic and industry should be prioritised. While there is evidence that the undergraduate community has changed significantly over the last decade in diversity terms, some of the changes have not been translated into the postgraduate and early career researcher communities in the UK. We welcome the recent reviews by EPSRC and ESRC of doctoral training and the recent NERC guidelines on best practice principles in doctoral recruitment and echo the need to ensure doctoral training remains an attractive and accessible option.
Targeted support through financial packages, exposure to career options and mentoring programmes during the later stages of undergraduate and Masters degrees may encourage a diversification of PhD candidates. A similar programme of support would be welcomed at the early career level (usually defined as within 5-8 years of completion of a PhD) when many graduates will experience insecure employment and high competition for fellowships.
One tool available to funding bodies, industry and academia, which may be potentially underutilised, are the positive action provisions contained within the Equality Act 2010. While these are increasingly being used for targeted schemes, there remains a hesitancy to fully utilise the provisions to address inequality due to fear of misinterpreting the guidelines and inadvertently creating discriminatory practices. As shown in the Turing’s experience of using the specific provisions for disability within the Act, ensuring candidates are progressed to shortlist can have a significant impact. Further guidance or review of how these provisions can be successfully employed would be welcome.
Thirdly, we suggest the research sector would benefit from consistent reporting of data on protected characteristics. While the funding councils and some other bodies do report this data, a sector-wide approach, similar to that for Higher Education and as managed by the Higher Education Statistics Authority, would allow far greater benchmarking, analysis and accountability. This is in addition to the suggestions in part a and b of this paper for the publishing of diversity data specifically covering the data science and AI fields in academia and industry.
Finally, we suggest continuing investment in pilot programmes aimed at tackling underrepresentation and inequalities, such as the recent Research England funding for increasing ethnic minority participation in postgraduate research and the 2018 Inclusion Matters funding and an encouragement of partnership working across the sector. Different groups and different individuals face different barriers and a range of interventions rather than a one size fits all approach would be helpful to implement to see further change in representation we will need to implement. Where funding is available, and organisations who make progress are recognised and rewarded, we will create a fertile environment for the development and promotion of pilot schemes.