|
|
|
Dr Eilidh Noyes (University of Huddersfield) and Dr Reuben Moreton (Reli Ltd) — Written evidence (NTL0026)
Biography
This submission is made by Dr Reuben Moreton and Dr Eilidh Noyes on behalf of Reli and the University of Huddersfield respectively. Dr Moreton and Dr Noyes both hold PhDs on the topic of face recognition.
Dr Moreton is an independent researcher and consultant in forensic face recognition. With over 10 years' experience working in face recognition as an expert witness for the Metropolitan Police, Dr Moreton has authored best practice guidance and standards in the use of face recognition, in addition to peer-reviewed research. He regularly provides training and advice in forensic face recognition.
Dr Noyes is a Senior lecturer in cognitive psychology at the University of Huddersfield with research expertise in human and machine face recognition. Dr Noyes has experience in working with forensic users of face recognition in the UK and internationally and regularly publishes research in peer reviewed journals.
Summary
This response will focus on the use of facial recognition technology and law enforcement, with emphasis on ensuring the accuracy and reliability of both the algorithm and the human operator.
We make two clear guiding principles to address accuracy and reliability in the use of face recognition technology by law enforcement:
Principle 1: All face recognition technology used for law enforcement must be subject to rigorous in-house testing, using images that are representative of operational use and of known ground truth. Images must also be demographically representative for a given use-case (e.g. an algorithm used with a databases of child faces must be tested with images of children). Such in-house testing will assist in establishing operationally realistic error rates, ensure the technology is appropriate for the intended use, and help reduce bias.
Principle 2: Face recognition technology must be operated by law enforcement personnel with the necessary skills and expertise to do so. The roles of face recognition technology operators should be professionalised in a formal manner. Human operators should also undergo continuous professional development and evaluation to demonstrate their ability and expertise, including testing using operationally realistic images of known ground truth. The required skills and expertise may vary by use case and volume of cases, therefore more basic users should have the option to escalate complex cases to more qualified experts for review.
Overview
We provide a brief overview of how face recognition technology is used and how it works — necessary to understand appropriate use cases. This is followed by our response to Questions 1-3 from the call for evidence.
Face recognition technology– how is it used?
Face recognition technology is used to compare two or more faces with the aim of verifying or establishing a person's identity. The comparison is made by a trained computer algorithm, which produces a similarity rating as to whether the images depict the same person’s face. Typically, a human then reviews the output from the system to decide if the images are the same person. In ‘lights out’ systems the output may be automatically considered a match if the similarity rating passes a certain threshold (e.g. when used at the border in an e-gate system). Although facial recognition technology is not new, recent step changes in accuracy and performance have led to a drastic increase in use by both the public and private sectors.
Law enforcement use of face recognition technology is commonly in an identification (1:N) capacity, meaning that an image of an unknown person is compared against a database of known images to establish the unknown person’s identity. The aim is to identify the unknown person of interest from known databases. Identification searches can be carried out in two modes:
In both retrospective and live searching a list of potential candidates is produced for human review. This can be either a fixed-length candidate list (e.g. the top 50 candidates with the highest similarity scores) or threshold based, where only candidates above a stated similarity rating are returned for review. The operator then reviews the candidate list and any viable matches are progressed as an investigative lead for further investigation[1]. At present in the UK there are no national requirements for training or qualification regarding the testing and operation of face recognition technology, or the reviewing of results.
We discuss UK law enforcements' use of face recognition technology in our response to Q1.
Face recognition technology– how does it work?
A basic understanding of how face recognition technology works is important in understanding appropriate use cases (Q2), accuracy, and bias (Q3). There are multiple steps that an algorithm must perform to compare two faces. The first step is face detection – an algorithm must locate the face in the image. If a face is detected, the algorithm computes a feature representation for the face (a numerical code that contains feature scores for the face). A face representation is created for each face that the algorithm detects. Note that the ‘features’ used by the algorithm are unlikely to be the high-level concepts such as the eyes, nose and mouth that human observers tend to focus on [1]. They are also unlikely to focus on the physical distance between features, as measures from images vary, due to factors such as camera angle and camera-to-subject distance [2], [3]. The face representations are used to compute a measure of similarity between images. Images which have a high similarity score are likely to be the same identity, and images with a lower similarity score are more likely to be different identities.
Most face recognition systems have an operator defined threshold which establishes the cutoff similarity score for potential matches and non-matches. Whether a similarity score passes the criterion threshold will be highly dependent upon the accuracy of the algorithm and the quality of the images being compared. A threshold is typically provided by the suppliers of commercial algorithms, however in-house testing by the end user is recommended to ensure that this operational threshold is suitable for the applied use case [4]. We return to the issue of criteria selection in our answer to Q3.
With the exception of ‘lights out’ systems, which are not commonly used in law enforcement, a human operator must review the potential matches from the algorithm. Thus, the final decision of whether two faces are of the same identity is largely dependent upon the expertise of the human operator (see our response to Q3).
Response to specific questions
1. Do you know of technologies being used in the application of the law? Where? By whom? For what purpose?
Face recognition technology is predominantly used by law enforcement in the UK for identification (1:N) purposes, via both retrospective and live searching, with retrospective searching being the most common.
The Metropolitan Police were early adopters of face recognition technology for retrospective searching, with the capability to search against their custody image database first implemented in 2008. Access was restricted to small teams of specialist users[2]. In 2013 a retrospective searching capability was introduced to the Police National Database (PND), allowing any PND user to search a facial image against custody images that had been uploaded to PND from various police forces [4]. Another national implementation of face recognition technology for UK law enforcement is the introduction of the face matching capability to the Child Abuse Image Database (CAID)[3], to assist in the identification of victims of child sexual abuse.
Police forces have also trialed live face recognition technology, notably South Wales police, initially at the 2017 Champions League final[4], and the Metropolitan Police 10 trials beginning in August 2016[5]. Trials of live face recognition technology have garnered much media scrutiny and faced legal challenges in the courts[6].
An important distinction to note here is that the implementations discussed above are deployed on police-owned IT infrastructure, either on premises in police data centers or nationally with secure remote access, which offers some level of control over who can access and use such systems. More recently, with advances in cloud computing capabilities, commercial companies have offered face recognition technology to law enforcement as Software as a Service (Saas). In this scenario the user would upload an unknown image to face recognition technology hosted on the commercial company’s servers and carry out an identification search.
The face recognition technology provided by the US-based company Clearview AI is a particularly controversial example, where the law enforcement users can search against a database of images numbering in the millions, scraped from the internet by Clearview AI. The company is now facing a number of complaints by privacy campaigners claiming that the company is in breach of GDPR[7]. Individual law enforcement users can sign up to Clearview AI for a free 30-day trial, potentially providing access to facial recognition technology to any police personnel in the UK. Whilst the service has seen widescale use by US law enforcement, usage in the UK does not appear to be as extensive[8].
A clear trend, particularly for retrospective searching, is that automated face recognition technology is becoming more accessible to a wider range of users within law enforcement, and whilst such technology can provide vital investigative leads, the consequences of an error can be profound. There is substantial variation in face-matching accuracy between different algorithms and different humans [5]. Therefore, the potential risk of errors from face recognition technology are heightened when the technology has not been subject to sufficient testing and evaluation, and when the human users do not possess adequate skills and expertise to operate the technology.
The applications discussed in this section demonstrate that the use of face recognition technology in UK law enforcement is complex and changing, and likely to become more so as the technology continues to evolve. As face recognition technology develops and becomes more accessible it is essential that there is oversight and safeguards in place for how the technology is procured, used and interpreted by law enforcement personnel.
2. What should new technologies used for the application of the law aim to achieve? In what instances is it acceptable for them to be used? Do these technologies work for their intended purposes, and are these purposes sufficiently understood?
Face recognition technologies should assist law enforcement in its overarching aims of preventing crime, identifying offenders and safeguarding victims. To do so, this technology and the people who use it must be able to make accurate identifications from the types of face images encountered in criminal investigations. The limitations of the technology must also be understood to mitigate risks of error, particularly where those errors can systematically effect one part of society more than another. Recent research suggests that the public is supportive of algorithm use when legitimately used by police for security purposes (we refer to evidence submitted to this committee by Dr Kay Ritchie). It is, therefore, important to maintain public trust and confidence by using facial recognition technology in a safe and responsible manner.
We strongly advocate that all face recognition technology used in law enforcement is subject to rigorous in-house testing prior to use, and operated by specialist users with the requisite skills and expertise, to maximise accuracy and mitigate the risk of errors.
3. Do new technologies used in the application of the law produce reliable outputs, and consistently so? How far do those who interact with these technologies (such as police officers, members of the judiciary, lawyers, and members of the public) understand how they work and how they should be used?
Algorithm Accuracy
The accuracy of face recognition technology depends on the algorithm that is used, the properties of the images analysed, the environment in which the technology is deployed and the capabilities of the human operating the technology.
Algorithms can be broadly classified as ‘early algorithms’ or ‘Deep Convolutional Neural Networks’ (DCNNs). DCNNs were first introduced in around 2014 and engage in a vastly different feature extraction process to that used by early algorithms. DCNNs are trained on large quantities of labelled training data[9] (typically many millions of images). The DCNN selects which features to extract from input images based on which features resulted in most accurate identification for the training data. In contrast, early algorithms extract hand-coded features (features that were selected by the programmer).
Face recognition algorithms (both early design and DCNNs) typically match or exceed the accuracy of humans on face identification tasks when images are front facing, there is no occlusion of the face, and images are of high quality [6]. DCNN performance is improving at a rapid rate, so much so that reports of state-of-the-art algorithm accuracy are quickly outdated. Many of the latest DCNNs perform with perfect or near perfect accuracy for images that vary in pose, expression, and illumination, providing that the image is of a good quality. Some DCNNs can easily identify faces that are occluded by surgical face masks, or are in deliberate disguise, which can be extremely challenging for the average human observer [7]. However, it is important to note that this is in reference to state-of-the-art face recognition technology tested in controlled settings, and cannot be construed as a measure of accuracy for all face recognition technology in all environments. In real-settings, images may be of suboptimal quality or environmental conditions may be inhibitory to realising the full accuracy of face recognition technology.
Algorithm accuracy reduces significantly if images are of low quality (e.g. highly pixelated or blurred), or in some cases, if the face is occluded. Poor image quality or occlusion can result in failure to detect a face in an image. Even if a face is found, the algorithm may not be able to extract a meaningful feature representation of the face if the image is of low quality.
A recent report that speaks to the practical constraints of facial recognition from poor quality images revealed that algorithms deployed in field settings by police failed to detect a face in 55-60% of images [8]. Poor image quality can be caused by the height or distance from which an image is captured, camera resolution or lighting issues, or movement related blur. Low quality can be an issue both for the image being searched and database images. To help protect against image quality related errors, extensive testing of the technology using images that are operationally representative should be conducted, with overall oversight of the face technology by a suitably qualified human expert.
Consistency of algorithm performance and Demographic Bias
An algorithm’s performance is consistent in that it will provide the same result for the same images every time. Algorithms are not affected by factors such as fatigue or boredom, which can affect consistency of a human operator. Consistency of performance for the same images across different algorithms is more complex. The performance of one DCNN can differ to that of another DCNN because of differences in algorithm architecture, design, and qualities that relate to the training data. The US government runs competitions to provide standardised tests of algorithm face matching performance. A key take home message from these tests is that large differences in accuracy rates are evident between the highest and lowest preforming algorithms. In 2019 the National Institute of Standards Technology (NIST) face recognition vendor test investigated algorithm performance for face images that varied in race, gender, and age. A general performance advantage was observed for white male faces, with lowest accuracy rates for black female faces. There were stark differences in overall accuracy of algorithms, and the biases and extent of biases that were observed across different algorithms. Applied users of face recognition algorithms need to perform rigorous inhouse testing to understand how the performance of their algorithm is influenced by demographic factors, and to set appropriate criteria to minimise algorithm bias [9]. Such testing is essential to ensure that face recognition technology does not systematically bias against different groups in society.
Demographic biases have also been found in human face-matching ability, the most well researched being the own-race effect, where individuals are worse at recognising and matching faces of a different ethnicity to their own [10]. Human operators of face recognition technology should also be subject to training and testing using operationally representative images, to ensure they possess the necessary skills and expertise to carry out their role accurately and reliably.
Variation in human operator performance
Humans are surprisingly poor at comparing faces of people we do not know, with large individual differences in ability. This is in stark contrast to the relative ease by which we can recognise people who are familiar to us (see [11] for a thorough explanation on the differences between familiar and unfamiliar faces).
Even when comparing faces in images taken on the same day, people can be mistaken, on average, one-fifth of the time [12]. Human performance at comparing unfamiliar faces further declines when images are taken under suboptimal conditions, e.g. images from CCTV systems [13]. Poor face-matching performance has also been observed for some professional groups, including passport officers, where years of experience bared no correlation to performance [14], and operators of facial recognition technology [15]. There are however some people who demonstrate consistently superior face-matching ability, namely ‘super-recognisers’ who are people with a high natural face recognition ability [16], and trained forensic examiners [17].
Performance of a human operator can have a significant impact on the accuracy and reliability of results from a face recognition system [15]. The interaction between the algorithm and the human operator and its impact on accuracy is often overlooked when evaluating the performance of face recognition technology [18].
Understanding how algorithms are and should be used
There is currently no legal requirement for the operator of a facial recognition algorithm to possess any specific knowledge, skills, ability of expertise relevant to the role, nor that an algorithm goes through in-house testing to verify criteria thresholds for identifications. This is highly concerning. The lack of legislation on these issues could lead to inaccuracies in the interpretation of algorithm output, and biases in face recognition systems.
Face recognition algorithms can be highly accurate in certain situations. Performance will depend on the algorithm that is used, on image quality and demographics of the photographed individual, and on ability and expertise of the human operator. Algorithm developers supply accuracy rates and operational thresholds to applied users, however the testing procedures and images used in these developer’s tests may not be representative of the images encountered in operational practice. This means that the criterion and accuracy rates that are supplied with the algorithm cannot be relied upon unless verified by in-house testing. In-house testing of algorithm performance is essential to ensure appropriate usage of these systems in the application of the law. In-house tests should include images that are representative of the types of images encountered in practice, and where the ground truth is known (i.e., whether the images truly depict the same person, not solely based on a human decision that the images are of the same identity).
A recent workshop with UK police officers on face recognition in February 2020, run by the authors, highlighted that attendees had a limited understanding of both face recognition technology and human face recognition. Notably, many of the attendees had access to and used face recognition technology. Given the emergent nature of face recognition as a policing tool, there is a pressing need to improve understanding within both law enforcement and the wider criminal justice system. A first step to doing so would be to professionalise the operation of face recognition technology, providing training and testing of law enforcement users. This would ensure that operators are sufficiently skilled and understand the nuanced complexities of face recognition.
3 September 2021
References
[1] A. J. O’Toole, C. D. Castillo, C. J. Parde, M. Q. Hill, and R. Chellappa, “Face Space Representations in Deep Convolutional Neural Networks,” Trends Cogn. Sci., vol. 22, no. 9, pp. 794–809, 2018.
[2] R. Moreton and J. Morley, “Investigation into the use of photoanthropometry in facial image comparison,” Forensic Sci. Int., vol. 212, no. 1–3, pp. 231–237, 2011.
[3] E. Noyes and R. Jenkins, “Camera-to-subject distance affects face configuration and perceived identity,” Cognition, vol. 165, pp. 97–104, 2017.
[4] G. Whitaker, “PND Facial Search Accuracy Evaluation,” 2015.
[5] Academy of Social Sciences in Australia Inc., “Evaluating face identification expertise: turning theory into best practice,” 2020.
[6] A. J. O’Toole, X. An, J. Dunlop, V. Natu, and P. J. Phillips, “Comparing face recognition algorithms to humans on challenging tasks,” ACM Trans. Appl. Percept., vol. 9, no. 4, pp. 1–15, 2012.
[7] E. Noyes, J. P. Davis, N. Petrov, K. L. H. Gray, and K. L. Ritchie, “The effect of face masks and sunglasses on identity and expression recognition with super-recognizers and typical observers,” R. Soc. Open Sci., vol. 8, no. 3, 2021.
[8] B. Davies, M. Innes, and A. Dawson, “An Evaluation of South Wales Police’s Use of Automated Facial Recognition,” 2018.
[9] P. Grother, M. Ngan, and K. Hanaoka, “Face Recognition Vendor Test ( FRVT ) Part 3 : Demographic Effects,” 2019.
[10] A. M. Megreya, D. White, and A. M. Burton, “The other-race effect does not rely on memory: Evidence from a matching task,” Q. J. Exp. Psychol., vol. 64, no. 8, pp. 1473–1483, 2011.
[11] A. W. Young and A. M. Burton, “Are We Face Experts?,” Trends Cogn. Sci., vol. 22, no. 2, pp. 100–110, 2018.
[12] A. M. Burton, D. White, and A. McNeill, “The Glasgow Face Matching Test.,” Behav. Res. Methods, vol. 42, no. 1, pp. 286–291, 2010.
[13] V. Bruce, Z. Henderson, C. Newman, and A. M. Burton, “Matching identities of familiar and unfamiliar faces caught on CCTV images,” J. Exp. Psychol. Appl., vol. 7, no. 3, pp. 207–218, 2001.
[14] D. White, R. I. Kemp, R. Jenkins, M. Matheson, and A. M. Burton, “Passport officers’ errors in face matching,” PLoS One, vol. 9, no. 8, 2014.
[15] D. White, J. D. Dunn, A. C. Schmid, and R. I. Kemp, “Error rates in users of automatic face recognition software,” PLoS One, vol. 10, no. 10, 2015.
[16] E. Noyes, P. J. Phillips, and A. J. O’Toole, “What is a Super-Recogniser,” in Face Processing: Systems, Disorders and Cultural Difficulties, M. Bindemann and A. M. Megreya, Eds. Nova Science Publishers, 2017, pp. 173–202.
[17] R. Moreton, “Forensic face matching: Procedures and application,” in Forensic Face Matching, M. Bindemann, Ed. Oxford University Press, 2021.
[18] A. Towler, R. I. Kemp, and D. White, “Unfamiliar Face Matching Systems in Applied Settings,” in Face Processing: Systems, Disorders and Cultural Difficulties, M. Bindemann and A. M. Megreya, Eds. Nova Science Publishers, 2017, pp. 21–40.
[1] https://www.met.police.uk/foi-ai/metropolitan-police/disclosure-2019/july-2019/software-capability-conduct-facial-matching/
[2] https://www.met.police.uk/foi-ai/metropolitan-police/disclosure-2019/july-2019/software-capability-conduct-facial-matching/
[3]https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/759328/CAID_Brochure_May_2018_for_gov_uk.pdf
[4] https://afr.south-wales.police.uk/wp-content/uploads/2021/04/All-Deployments.pdf
[5] https://www.met.police.uk/foi-ai/metropolitan-police/disclosure-2019/september/information-afr-technology/
[6] https://www.theguardian.com/technology/2020/aug/11/south-wales-police-lose-landmark-facial-recognition-case
[7] https://www.bloomberg.com/news/articles/2021-05-27/clearview-ai-hit-by-wave-of-european-privacy-complaints
[8] https://www.buzzfeed.com/emilyashton/clearview-users-police-uk
[9] Training data must not be confused with database images. Database images are not used to train the algorithm, but are instead images provided by the end user that other input images can be compared against (e.g. a watchlist of known offenders).