DAIC0008
Written evidence submitted by Dr Jia Wang.
Author’s bio:
Dr Jia Wang is an associate professor of law at Durham University (DU). Her research interests lie in law, AI and emerging technologies. Wang has published in leading international journals and presented at the AHRC In-Game International workshop and Modern Law Review Workshop on the legal protection for AI-generated content. She was a Research Fellow at the Harvard Berkman Klein Center and Singapore Management University, where she worked on regulating disruptive technologies. The latest publications have considered ethics, artificial personalities, generative systems and Metaverse developments. She features multidisciplinary research in collaboration with colleagues in computer science, design and informatics. She publishes with Queen Mary Journal of Intellectual Property, Intellectual Property Law and Practice, European Intellectual Property Review, Hong Kong Law Journal, European Review of Private Law, Asian Pacific Law Review and a monograph with Springer. In 2022-2023, she was awarded a DU Seedcorn Grant for a project on the protection of AI-generated content for video games. She is a member of the Responsible AI (RAI) UK Partner Network. She applied for the UKRI RAI International Partnerships as the Project Lead and is waiting for the decision by the end of January 2024. Wang’s email is jia.wang2@durham.ac.uk.
Submission:
I. Data and AI training
- For Pillar 2 of the AUKUS Partnership (Advanced Capabilities), a robust AUKUS framework could be designed to allow innovation in artificial intelligence (AI). So far, AUKUS Pillar 2 has not devoted much consideration to effective data strategies for the advanced technologies it seeks to deploy. In 2022, the United Kingdom published its national Defence Artificial Intelligence Strategy, and the United States published its Responsible Intelligence Strategy and Implementation Pathway. However, their efforts to operationalise AI in their respective forces have been largely separate. Yet it is critical to begin this work as early as possible to develop consistent standards, from technical safety protocols to acquisition requirements to best practices and procedures. AUKUS Pillar 2 represents a timely opportunity to synchronise these efforts to reduce the possibility of lock-in effects, which can hamper data sharing across platforms. The UK Government can further strive to accelerate algorithm development, expand the quality and quantity of training data, enhance advanced data analytics and provide access to a mature understanding of legal, regulatory and ethical issues.
- AUKUS members position themselves to be global leaders not only in AI capabilities but also in employing responsible AI—for instance, the United Kingdom hosted the first global summit on regulating AI in 2023. It must be noted that AI systems pose unique difficulties that require additional attention to the quantity and quality of data that is used for AI training. Getting the training of AI models wrong could cause technical bottlenecks that could slow development or deployment—or even cause an outright system failure. In linking AI systems to other technology (when combined performance can become even harder to monitor and track), it is particularly vital to clarify the usability of certain data for AI systems early on to ensure interoperability between systems in terms of operational considerations such as standardising protocols and procedures, sharing information, and developing operational best practices.
II. The problem between data minimisation and optimisation of AI models
- The use of machine learning (ML) holds substantial potential to improve many functions and needs of the public sector. In practice, however, integrating ML/AI components into public sector applications is severely limited because of mismatches between components of ML-enabled systems. For example, if an ML model is trained on data that is different from data in the operational environment, the performance of the ML component will be dramatically reduced. De-identification standards aiming to protect privacy can inadvertently impact the performance and reliability of these models as well. Specifically, the de-identification process can lead to difficulties in generalising findings across different groups/populations, introduce biases and skewness in datasets, and create computational complexities. Opting-out mechanisms allowing the withdrawal of personal data can lead to bias, as specific groups are more prone to withdraw their data. Those issues pose urgent threats to using big data to train AI models but received little attention.
- The notion of information is central to data protection law and algorithms/ML. This centrality gives the impression that algorithms are just another data processing operation to be regulated. However, the data-as-information approach for data protection law only focuses on input data but does not extend to output data. Hence, the current regulatory framework for data protection has hindered various AI models from achieving optimal efficiency. The chilling effects will be exacerbated if the data-as-information approach is not to be revisited and recalibrated.
- The regulatory framework of data used for AI training needs a revamp. AUKUS members have adopted a risk-based or a rights-based approach to data protection law, with some convergence of the two approaches. The rights-based approach regulation provides an even level of protection to all, but this protection might simply be inadequate. The risk-based approach might provide a type of protection that is tailor-made to each specific processing operation, but this leads to an individualised and uneven data protection style. A more harmonised regulatory approach will help improve AI integration.
III. Recommendations: Achieving Effective Data Regulation for AI Training
- First, it is necessary for the UK Government to have a working group whose primary focus is to consider data regulation for the effective training, development and deployment of AI applications. The objectives of the working group could be: (1) to clarify the meaning of data and information with a focus on the categorisation of data, that is, a distinction between personal and non-personal data; to reconceptualise control, identifiability, and reachability between data controllers, entities and individuals; (2) to further the understanding of the influence of removing certain regulated attributes from the dataset on the performance of relevant algorithms; (3) to further the understanding of the regulatory approaches and explore alternatives for more effective data regulation. It is advisable to invite a wide range of stakeholders, including legal and regulatory experts, data scientists, ML scientists and operational personnel for diversified input.
- The current data protection regulation is embedded into a logic of knowledge communication, whereas ML’s definitions stem from a logic of knowledge creation. Data protection regulations insufficiently acknowledge the multitude of data processing operations crucial for knowledge production. This gap arises due to a conceptualisation of information that excludes the capacity for learning from data, emphasising only the communication aspect of information. The debate between risk-based and rights-based approaches to data regulation can be reconceptualised as a matter of variations around the concept of proportionality. A proportionality test enhances regulatory efficiency and effectiveness by directing efforts proportionately to risks, thereby optimising resource utilisation. This has the potential to alleviate the responsibilities of data controllers, enabling them to strategically target regulatory resources, scale them appropriately, and select the most suitable approach. A proportionality test that recognises different interpretations and applications of proportionality in rights is recommended. To operationalise the proportionality test, a shift towards evaluating decisions based on a cost-benefit approach (e.g., privacy vs. national security) rather than solely on rights is suggested.
- It is recommended that the UK Government needs to ensure AUKUS partners' compatible data regulation and smooth data sharing from the outset by harmonising domestic AI regulation approaches as they are being developed and leveraging relevant military expertise in AI and systems integration.
- In the long term, it is recommended to perceive AUKUS not solely as a security framework but as a stronghold of knowledge and exploration as ‘Big Science’.
- Ultimately, the earlier these efforts synchronizing data regualtion are implemented, the better they will be able to improve AI performance through enhanced interoperability capabilities, thereby providing a significant competitive advantage to AUKUS partners.
17th January 2024