House of Lords Communications and Digital Select Committee inquiry: Large language models
This is a response to the UK Parliament Call for Evidence on Large language models submitted on behalf of Epoch. Epoch is a research initiative investigating trends in Machine Learning and forecasting the development of advanced AI systems. Our mission is to bridge the gap between technical AI developments and decision making in the governance of AI. We are submitting evidence as part of this mission.
1. Training large language models (LLMs) requires specialised large-scale computing hardware. Currently deployed LLMs are trained on a compute budget of 2x1025 FLOP (FLoating point OPerations, a standard compute measurement), and 10 trillion words of text data.
2. The investment needed to build and maintain the infrastructure for training currently deployed LLMs is in the $300M to $600M range.[1] When renting compute from the cloud, the cost to train such models is in the $40M to $100M range (see Epoch’s trend board for references and more information).
3. Modern generative models can be used, among other things, to edit texts, autocomplete messages, do some basic coding tasks and generate images according to user specification. Outputs are often not reliable and require careful supervision by users.
4. In the next 3 years, we expect that state-of-the-art commercially available LLMs will be trained on a compute budget between 50 and 100 times greater than today’s models (Sevilla et al., 2023a), and cost between $600 million and $3 billion (Cottier, 2023).[2]
5. Furthermore, we expect that in the coming three years developers will exhaust publicly available high-quality data sources such as books, news, scientific articles and open source repositories. They will either resort to lower quality sources such as social media, or innovate on data efficiency techniques to lower data requirements. (Villalobos et al, 2023).
6. Associated with the larger scale of models we expect to see increases in AI capabilities (Owen, 2023). Capabilities that we expect to see developed include video and music generation, better writing and coding, and better reasoning skills on a variety of problems. We expect that reliability will still be an issue, and developers won’t have found a way to prevent users from eliciting unwanted behaviour from the deployed models (“jailbreaking”). We expect companies to experiment with applications to summarise search results and integrations with web applications, though ultimately it is unclear if the reliability problems will prevent these integrations from being widely used.
7. Besides seeing advances in the frontier, we also expect a wider proliferation of LLMs. This will be enabled by open source replications and better techniques to train capable models with less compute. These models will lag behind the state of the art, but could reach many users (Cottier, 2023).
8. Despite these limitations, and while we do not expect radical transformations in the coming three years, in the coming decades we expect a radical transformation of the economy and society precipitated by Artificial Intelligence. This transformation might occur very rapidly, spanning less than a decade (Davidson, 2023). Such transformation could lead to economic growth tenfold greater than in modern Britain (Besiroglu and Erdil, forthcoming), but also might be destabilising – AI will pose unprecedented global risk management challenges.
a) Given the inherent uncertainty of forecasts in this area, what can be done to improve understanding of and confidence in future trajectories?
1. Access to compute is a key component of modern AI capabilities. To ensure the UK is well-informed about the potential of actors producing state-of-the-art models, we recommend that the UK invests in creating and maintaining a map of existing compute clusters, within its borders and internationally.
2. Some additional scientific questions that we think deserve more attention include:
● Understanding the returns to further research in ML hardware and software,
● Coming up with better theories to estimate training requirements for different tasks, and how these could decrease with architectural improvements and post-training augmentations,
● Investigating the drivers of ML research, particularly the importance of talent and access to compute for experimentation, and
● Figuring out how data bottlenecks will (or not) be addressed in the future.
We recommend supporting academic and public analysis work in these areas.
3. We think that a better understanding of development dynamics will help us better anticipate the trajectory of AI and identify policy levers to steer it. To this end, the Epoch team is working on an integrated assessment model that combines insights from economics and Machine Learning. We recommend supporting work in this area, and remain available for consultation.
4. Beyond this foundational work, we recommend work in red teaming different scenarios, the organisation of conferences and workshops with the explicit goal of answering key questions about the future of AI, and supporting work on quantitative, falsifiable forecasting.
1. The UK is in a position to lead the process of regulation of AI models worldwide. Models developed at the frontier should be tracked and subject to independent third party audits (Anderljung et al, 2023). This will help the UK guide the incentives of AI development and commercialisation to better serve the interests of the nation.
2. Some of the greatest opportunities in coming years arise from the potential for AI to increase productivity across many activities in the economy. LLMs have already driven productivity improvements in software engineering (Tabachynk and Nikolov, 2022) and customer assistance (Brynjolfsson et al, 2023). However, widespread adoption of AI tools will require caution: users must be trained to understand that the model outputs will not be reliable and require human oversight.
3. In terms of risks, we anticipate that in the next three years LLMs will be used to enhance cyberattacks such as spear phishing (Hazell, 2023), and might deteriorate the reliability of information on the internet (Seger et al, 2020). There is also the risk of labour market disruption from LLMs. Workers may require retraining to reintegrate into the labour market, and policy interventions may be important to establish such programs (OECD, 2023).
4. In the long term, AI (including LLMs) might give rise to even bigger risks such as lowering the barriers to entry to the development of disruptive technologies such as bioweapons, widespread user manipulation and sophisticated cyberattacks. These vulnerabilities might be pursued by malicious users (Brundage et al, 2018) or autonomously by the AI (Ngo et al, 2023). We recommend consulting the Centre for the Governance of AI for a more detailed view on risks.
18 July 2023
4
[1] One server of eight NVIDIA A100 GPUs reportedly costs about $200,000. Training requires about 25,000 NVIDIA A100 GPUs. 25000 / 8 * $200000 ~= $600M. NVIDIA H100 GPU servers have approximately three times higher performance at roughly 1.5 times higher cost, reducing the cost by a factor of two overall. There are likely to be large additional costs in building and maintaining the infrastructure.
[2] Assuming a current frontier training cost between $30M and $100M, multiplied by a compute scaling factor of 50 to 100 in three years, and divided by the rate at which hardware becomes cheaper (1.25x to 1.57x per year, based on the trend in Machine Learning GPUs from Hobbhahn & Besiroglu (2023)). Using a Monte Carlo simulation over these value ranges, the cost in three years would fall roughly in the range of $600 million to $3 billion.