Metawritten evidence (LLM0093)

 

House of Lords Communications and Digital Select Committee inquiry: Large language models

 

 

  1. Introduction

 

0.1.          Meta welcomes the opportunity to provide evidence on the Digital Committee’s Call for Evidence on Large Language Models. We welcome the collaborative discourse around AI governance in the UK, and the UK’s leadership in this space globally.

 

0.2.          For example, the ICO’s clear guidance on its expectations around AI and data protection strike the right balance between legal clarity and flexibility to account for technological development, demonstrating that in many instances existing regulation can, and should, apply to AI as it would for any other technology.

 

0.3.          We believe these elements are all positive indicators of an approach to AI that acknowledges the opportunities and benefits, while balancing the need to manage potential risks. We are pleased to offer our perspective to the Committee to help pursue this objective.

 

 

CAPABILITIES AND TRENDS

 

  1. How will large language models develop over the next three years?

 

1.1.          Large language models (LLMs) are a relatively recent, dynamic and continuously evolving field within AI. We believe the best way to advance their development is to take an open innovation approach, to democratise access and accelerate progress.

 

1.2.          With thousands of open source contributors working to make models, which are based on open innovation, we can more quickly find and mitigate potential risks in systems and build safer, more effective products. The more AI-related risks that are identified by a broad range of stakeholders - researchers, academics, policymakers, developers, other companies - the more solutions the AI community, including tech companies and governments, will be able to find for implementing guardrails to make the technology safer.

 

1.3.          With appropriate guardrails, with access to LLMs, it is more likely that toxicity and bias can be identified and appropriately addressed and mitigated. In closed source software, only a limited number of developers can see or influence the source code. Additionally, an open innovation approach has wider economic benefits. For example, entrepreneurs and SMEs are able to build new products that benefit the economy. Researchers can improve open models and find new uses for the technologies.

 

1.4.          Looking ahead, we expect progress in LLMs to continue. However, we are nowhere near the capabilities of sci-fi style super intelligence that often trigger the most dystopian warnings. It is true that large language models are not yet well-defined or well-understood.  While it is important to be cognisant of possible risks, it is also critical that we do not lose sight of the enormous potential of this technology to deliver truly transformative positive change in the world. Our view is that the benefits of open sourcing, which makes that positive transformative change a real possibility, far outweighs the risks.  For this reason, policy and regulation must be proportionate in order to properly realise these benefits for society in a responsible way.

 

a) Given the inherent uncertainty of forecasts in this area, what can be done to improve understanding of and confidence in future trajectories?

 

1.5.          Like all foundational technologies – from radio transmitters to internet operating systems – there will be a multitude of uses for large language models, some predictable and some not. And like every technology, these models may be used for both good and bad ends by good and bad people. While we can’t eliminate the risks, we can mitigate them.

 

1.6.          Because the technology is still under development, and because it is not possible to foresee exactly how it will evolve, it is important that this is done transparently and in the open.  

 

1.7.          We believe the entire AI community — academic researchers, civil society, policymakers, and industry — must work together to develop clear guidelines around responsible AI.

 

1.8.          From our perspective, we are eager to share Meta’s experience and the steps we have taken to mitigate risks while building our latest large language model, Llama 2, responsibly. We identified and mitigated risk at every stage of the process. Our approach included:

 

        Filtering the training data that went into the models;

 

        Once the models were trained, fine-tuning them to increase safety;

 

        Human evaluation of the models’ behaviour, which helped us further train the models to respond to particular situations;

 

        Developing automated methods for assessing the performance of each model (as well as competing models) against a set of risk criteria;

 

        Adversarial red teaming by both internal and external experts.

 

1.9.          We also released a series of documents focused on transparency and responsibility, including:

 

        a research paper that explains how the model was built and how we worked to identify and mitigate risk

 

        a Responsible Use Guide that helps guide developers on how to use the models responsibly – including through use of tools like those Microsoft is providing through Azure

 

1.10.          Other steps we’re taking post-release include:

 

        Creating an academic community for open source LLMs that aims to deepen our understanding of the responsible development and sharing of LLMs.

 

        Funding a challenge for developers to use Llama to address environmental, education, and other global challenges.

 

        Incident reporting incentives (e.g. bug bounty) so people can identify and report potential issues with the models.

 

1.11.          We believe similar steps will be crucial to ensure the responsible development of AI so that it can truly benefit humanity as a whole.

 

  1. What are the greatest opportunities and risks over the next three years?

 

2.1.          Large Language Models are an exciting development in AI technology, with the potential to deliver enormous benefits to society. Generative AI, which has the ability to create new content using existing text, audio, images, or videos, has benefits for creativity and expression, as well as many scientific applications. We encourage the coordination and cooperation of regulators to ensure innovation like LLMs is supported within established regulatory frameworks and regimes.

 

2.2.          AI will be a transformative technology:

 

2.2.1.   Developments in AI are pushing the frontiers of scientific discovery from medicine to climate technology. AI will be central to addressing global challenges like energy use and climate change, and has the potential to revolutionise sectors like healthcare, education, and agriculture that can deliver enormous societal benefits. Some examples are:

 

        Advancing health and medicine. The models that power generative AI can be fine-tuned for purposes that advance public health (e.g. drug discovery; novel protein synthesis; chatbot companionship for the lonely and/or elderly);

 

        Combating climate change. The models behind generative AI may be useful in the fight for decarbonisation.

 

2.2.2.   The productivity gains from AI will be substantial. AI can automate rote tasks for everything from scheduling meetings, to transcription, to basic accounting. AI tools are being applied to increasingly specialised tasks, such as writing code, summarising legal texts, drafting contracts and producing marketing materials. As AI is applied to repetitive tasks, workers will concentrate more on complex, creative, and strategic tasks, and may also reduce the amount of time spent working, with more time for leisure.

 

        But automation is just one aspect: these models will spark a new age of invention and ingenuity. Specifically, the models that underpin generative AI are trained on huge quantities of information, representing a significant proportion of collective human knowledge. These models, which can be fine tuned to all manner of different tasks, will allow us to deploy vast collective knowledge to a multitude of tasks. These models are a powerful tool that people can use to draw upon that knowledge, and apply it in new ways, to invent, create and solve problems.

 

        Additionally, using generative AI for software development allows engineers to free up time from simpler engineering tasks to focus on more complex coding and higher-order artifacts.

 

2.2.3.   AI can improve safety, reducing the need for people to complete dangerous tasks, from identifying and removing landmines to monitoring industrial processes to reduce risks of explosions, or leaks of hazardous materials.

 

2.2.4.   AI can enable accessibility for marginalized communities. Generative AI can facilitate text to speech and speech to text for people with visual impairment, people with dyslexia, or hard of hearing people, potentially enabling more economic opportunities and experiences for people with disabilities.

 

        Additionally, Generative AI may help boost the diversity of model training data and improve inclusivity in AI model performance by supplementing datasets with synthetic data.

2.3.          In a real life example, AI has already been used to speed up the discovery of new antibiotics. At Meta, the research version of our Llama model has  been accessed by researchers focused on tasks as varied as generative protein design and practical quantum physics. Our ESM fold model has enabled detailed cataloguing of protein structures in a first-of-its kind Metagenomic Atlas. Improving understanding of these structures can unlock new solutions for curing diseases, and producing clean energy, amongst other things.

 

2.4.          Meta is investing significantly in AI research and development, particularly when it comes to Large Language Models. Our AI labs are already making advancements in research and development as part of a long-term effort to enable the next era of computing.

 

        OPT-175B, for example, is a large language model (LLM) which Meta has made available to researchers and institutions who request access, and has been used for a variety of applications, including protein design, quantum physics, translation and more.  We then released LLaMA, a State of the Art LLM. Our LLaMA model has also been accessed by researchers focused on generative protein design, and practical quantum physics, among many other use cases.

 

        No Language Left Behind’ is a breakthrough AI project which enables the translation of 200 different languages with state-of-the-art results. This can allow more people to access information in their native language, and points to a future where linguistic barriers may no longer hold us back from communicating with others.

 

        We’re continuing to make progress on expanding speech recognition technologies to more and more languages as part of our Massively Multilingual Speech (MMS) project. We’ve released models for speech-to-text, text-to-speech, and more for 1,100+ languages.

 

        We’ve made multi billion dollar investments to create the computing infrastructure that can deliver high-performance inference and training capabilities and the related hardware, software and tooling that we use to deliver this. We make these investments for both research and product.

 

2.5.          Realising the many benefits of LLMs, and AI more broadly is not without risks (see question below). We must focus on safe and responsible development. However, it is important to remember that perceived risks must be weighed against potential benefits. Stifling innovation comes at a cost – it is discoveries delayed, economic and productivity growth stymied, and progress stalled.

 

2.6.          We welcome the UK government’s dual focus on fostering innovation and advancing AI safety, which are goals that Meta shares.

 

a) How should we think about risk in this context?

 

2.7.          Like all big technological shifts, it is important to be mindful of possible risks and take steps to mitigate them. There are multiple steps that can be taken to mitigate the potential risks surrounding Large Language Models.

 

2.8.          First, providers should be transparent around how their systems work in a way that allows understanding even to those who lack deep technical knowledge. Second, cross-ecosystem collaboration is imperative, and should involve industry, government, academia and civil society. Third, AI systems should be tested before their release, for example leveraging processes like “red teaming”, which allows for discovery of vulnerabilities and flaws as teams take on an adversary-like role.

 

2.9.          As with other AI systems, it is important to consider risk throughout the lifecycle of the system. For generative AI, it can be helpful to think in terms of three broad stages: training, input and output, as per the following framework:

 

  1. Research model training data
    1. Like all AI systems, generative AI systems are trained on datasets, some of which are very large. It is important to think about the makeup of those datasets. In particular:

 

      1. Privacy: does the dataset include private/sensitive content?

 

      1. Provenance: does the dataset include content from inappropriate sources?

 

      1. Diversity: is the dataset sufficiently representative of different demographic groups?

 

  1. User inputs
    1. As users are able to input their own queries and requests, there are a couple of things here to think about:

 

      1. Integrity: Users may ask the model to perform tasks that violate community standards or that are sensitive

 

      1. Bias/stereotyping: Users may ask/prompt the model to reproduce stereotypes
  1. Model outputs
    1. Generative AI creates new content. For these outputs it is important to consider:

 

      1. Accuracy: the model may ‘hallucinate’, producing factually incorrect or unreliable inputs

 

      1. Toxicity: the model may generate content that users find offensive, insensitive, or toxic

 

      1. Bias/stereotyping: the model may exhibit bias or stereotypes in its output without user prompting

 

2.10.          This framework could be helpful to think about risk and assess whether existing regulation is sufficient or may be adapted to address it, or whether novel solutions may be required. There is significant internal work happening across Meta to understand risks and develop mitigations - teams such as Civil Rights, Privacy, Responsible AI, and Integrity - all are, and will continue to be, closely connected as the development process continues.

 

2.11.          While it’s still early days for generative AI, we are committed to taking our learnings and exploring how to apply those to other products.

 

DOMESTIC REGULATION

 

  1. How adequately does the AI White Paper (alongside other Government policy) deal with large language models? Is a tailored regulatory approach needed?

 

3.1.          In general, Meta advocates for regulation that is risk-based and technology neutral. This means that regulators should look at the uses of the technology, rather than the technology itself.  As a result, said approach ensures that the regulation is applied proportionately, introducing requirements to ensure protections in high-stakes settings, whilst avoiding hindering innovation in lower-risk areas and being more future-proof.

 

3.2.          Because of that, we have concerns about measures targeted on specific technologies or classes of AI (e.g. ,foundation models”). With Large Language Models, just as with other AI systems, any potential risks arise dependent on the context in which they are deployed. It is unnecessary, therefore, to introduce tailored and specific requirements for providers of Large Language Models.

 

3.3.          Regulators should continue to assess whether new regulation is indeed necessary, and if so then the approach should be risk-based and technology-neutral.

 

a) What are the implications of open-source models proliferating?

 

3.4.          The spread of open source models is a welcome development. Historically, full research access to LLM’s has been limited to only a few highly resourced labs. This restricted access has limited researchers' ability to understand how and why these large language models work, hindering progress on efforts to improve their robustness and mitigate known issues such as bias and toxicity.

 

3.5.          Openness allows for collaboration, scrutiny and iteration. And it gives businesses, start-ups and researchers access to tools they could never build themselves, backed by computing power they can't otherwise access.

 

3.6.          For this reason, Meta is also developing new software tools for testing and improving robustness — and then sharing them with the AI research and engineering community. For example, our open source Captum library provides state-of-the-art algorithms to understand more easily and effectively which features of an AI model built with the open source PyTorch ML framework contribute to that model’s outputs. Captum helps AI developers interpret their models, benchmark their work, and improve and troubleshoot unexpected model outputs.

 

3.7.          In short, open source development plays a critical role in driving innovation and delivering economic benefits from new technologies. In the coming years, access to open-source models will play said crucial role in driving AI research, development, innovation and adoption.

 

  1. Do the UK’s regulators have sufficient expertise and resources to respond to large language models? If not, what should be done to address this?

 

4.1.          Good regulation should be the product of collaboration  across multiple stakeholders. UK regulators should coordinate and collaborate with the many experts and stakeholders of the AI ecosystem and devise their legislative strategies in conjunction with other co- or self-regulatory instruments (corporate ethical frameworks, standards, ethical codes of conduct, etc). Building these norms, standards, and practices will require a concerted effort to coordinate between the many actors in the AI ecosystem, and ensuring that the AI regulatory landscape is not bifurcated to the detriment of people, innovation, and the competitiveness of digital economies. 

 

4.2.          Only through the collaboration with—and engagement of—all stakeholders, can we strike the right balance between regulation and innovation, crafting standards that are people-centric whilst also being practical and achievable to ensure that crucial innovation is not hindered. We support the UK Government’s initiative to establish the Digital Regulation Cooperation Forum (DRCF), with the aim of ensuring greater cooperation on regulatory matters.

 

4.3.          Furthermore, the UK Government, through its regulatory agencies should explore the implementation of Regulatory Sandboxes (RS) and Policy Prototyping Programs (PPPs) as methods to test future laws or other governance instruments. Given the difficulty in assessing the most appropriate, feasible and balanced legislative instruments on a complex topic such as AI, RS and PPPs can provide a safe testing ground to assess different iterations of legislative models of governance prior to their actual enactment.

 

4.4.          In other words, what is needed from regulators is not ironclad rules but mechanisms—such as sandboxes, policy prototyping and other consultative processes— for helping to answer some of the fundamental questions that need to be addressed.

 

4.5.          However, what is important to keep in mind is that for the innovation-forward approach to be maintained, it is crucial that companies clearly understand their obligations, and that the latter are uniform. The risk is that the presence of multiple regulators in the UK could give rise to competing policy objectives, with significant challenges for companies and a possible detriment to development of  the AI ecosystem.

 

  1. What are the non-regulatory and regulatory options to address risks and capitalise on opportunities?

 

5.1.          There is an urgent need for governments to work together to set common AI standards and governance models, with a focus on the spaces where there are gaps within existing regulation and frameworks. This is also a valuable area where we want regulators to work with industry to set appropriate transparency requirements, red teaming standards, and safety mitigations – and help ensure those codes of practice, standards, and/or guardrails are done consistently across the world.

 


INTERNATIONAL CONTEXT

 

  1. a) To what extent does wider strategic international competition affect the way large language models should be regulated?

 

6.1.          AI should benefit everyone–not just a handful of companies. AI innovation is inevitable and it should be built by an AI research community to benefit the whole-of-society. This is especially true when it comes to LLMs. LLMs are extremely expensive to develop and train. Fostering a flourishing AI research community that enables experts from diverse disciplines to explore, challenge and innovate with cutting-edge technology depends on democratising access to the most sophisticated models, which are mostly developed by industry.

 

6.2.          An open innovation approach increases market contestability by spurring new market competition, creating more innovation and consumer choices.  Open innovation can also facilitate new entry by providing a wide range of stakeholders with access to AI models that will allow them to innovate and compete.  Open innovation also promotes sustainable economic growth by helping to close any gap by enabling researchers and SMEs to build on open source models, making new discoveries and building profitable businesses.

 

6.3.          In addition, an open approach has safety benefits. With thousands of open source contributors working to make AI systems better, we can more quickly find and mitigate potential risks in systems and improve the tuning to prevent erroneous outputs. The more AI-related risks that are identified by a broad range of stakeholders - researchers, academics, policymakers, developers, other companies - the more solutions the AI community, including tech companies, will be able to find for implementing guardrails to make the technology safer.

 

6.4.          The more access given to AI models, the more likely it is that toxicity and bias can be identified and appropriately addressed and mitigated.

 

b) What is the likelihood of regulatory divergence? What would be its consequences?

 

6.5.          Globally-harmonised frameworks are necessary to ensure consistent standards around the world. Such frameworks will protect people’s information wherever it goes and provide predictable rules for businesses - both being essential requirements for the long-term success of the global digital economy. Additionally, it will foster a level playing field for all AI providers operating across borders. The same is true for internal regulation: conflict across multiple regulatory areas - such as competition, privacy and content - would challenge the objectives of innovation and global leadership in AI for the UK.

 

6.6.          The consequences of fragmented regulation involve both users and the economy more generally.

 

6.7.          Neither AI development and innovation nor governance should not be locked within one region or one set of powerful companies. That’s why we’ve released Llama 2 and also why we believe that governments and the leading AI companies should work together on a common, interoperable governance framework. The discussion on the governance and regulation of AI needs to be a global and multi-stakeholder initiative in the first instance: it should involve not only government, but also industry, academia, and civil society.

 

6.8.          We note that coming out of the G7 Hiroshima Leaders Meeting in May this year, the OECD was encouraged to look at the policy impacts of generative AI and the Global Partnership of AI was tasked to conduct practical projects.  These multi-stakeholder processes are good avenues for an inclusive and global discussion on the key issues pertaining to AI. Other similar impactful fora are the UN Global Digital Compact, the G20 and the U.S.-UK Comprehensive Dialogue on Technology and Data.

 

 

September 2023

11