Dr Hayleigh Bosher—written evidence (LLM0109)

 

House of Lords Communications and Digital Select Committee inquiry: Large language models

 

 

About the Author

 

Dr Hayleigh Bosher is a Reader in Intellectual Property Law and Associate Dean at Brunel University London where she also runs the Brunel Law School IP Pro Bono Service and is a member of the Brunel Centre for Artificial Intelligence: Social and Digital Innovation. Hayleigh's research focuses on copyright and related laws and policy issues in the creative industries. She is the author of Copyright in the Music Industry, the producer and host of the podcast Whose Song is it Anyway?. Hayleigh has an international reputation as a leading expert in her field, she is regularly cited in academic, practitioner and policy outputs and is often interviewed by national and international media outlets. Dr Bosher completed her PhD at Bournemouth University which undertook a detailed analysis of the meaning of copying under UK copyright law. This research was also published as a book Law, Technology and Cognition: The Human Element in Online Copyright Infringement (Routledge 2020).

 

Executive Summary

 

  1. The use of protected works for the purpose of training and running LLMs is copyright infringement and therefore requires permission or a licence.

 

  1. A copyright exception does not and should not apply. It is not appropriate under national or international law to permit the use of copyright protected works for AI data training or the running of LLMs under a copyright exception.

 

  1. AI generated works should not be protected by copyright as they do not meet the legal requirements or satisfy the justification for copyright protection.

 

  1. Works created by humans with the use of AI as a tool could reach the threshold for copyright protection, but the scope of the protection should only extend to the original human creativity.

 

  1. The Government should introduce new rights to protect against the use of a person’s face and voice without permission by AI, in particular with deepfakes which is currently unregulated and poses a threat to privacy, misinformation and abuse.

 

 

 


1.              The use of protected works for the purpose of training and running AI is copyright infringement and therefore requires permission or a licence.

 

The use of copyright protected materials, as training data or in running a Large Language Model (LLM), without permission is an infringement of copyright, under both core copyright principles and UK copyright law. Indeed, this applies to all AI programs. The purpose of copyright is, in short, to encourage the creation and dissemination of culture and knowledge. It achieves this goal by granting rightsholders the right to prevent other’s using their work without permission, and it balances these rights by limitations such as in length, scope and with exceptions in special circumstances. Therefore, under the core principles of copyright, the use of copyright protected works for use in AI data training or programs would require permission.

 

The test for copyright infringement can be summarised in the following question: has the whole, or a substantial part of, a copyright protect work been copied without permission or the use of an exception? Under the UK Copyright, Designs and Patents Act 1988, use means the restricted acts, namely, to copy, issue copies, rent, lend, perform, show, or play, communicate to the public a work or make an adaptation of a work.[1] The restricted act of copying enshrines the core principle of copyright to allow the rightsholder to prevent the copying of their work. It is therefore a broad legal term that means reproducing the work in any material form, including storing the work in any medium by electronic means. In relation to an artistic work copying includes making of a copy in three dimensions of a two-dimensional work and vice versa. In relation to a film or broadcast, this includes making a photograph of any image forming part of the film. It includes making facsimile copy of a typographical arrangement.

 

For all works, infringement includes making copies which are transient or are incidental to some other use of the work. There is an exception for transient copies, but it is extremely narrow and only applies when an integral part of a technological process and the sole purpose of which is to enable a transmission of the work in a network between third parties by an intermediary; or a lawful use of the work, where both of these sceneries have no independent economic significance. It also does not apply at all to computer programs or databases. Therefore, the copyright exception for transient copies does not apply to LLMs.[2]

 

In order to train and run LLMs using copyright protected works, the programmer would need to make copies of the copyright protected works. Whilst some AI firms may argue that this copy is temporary (because the copy is tokenised and the original copy then discarded), this is a misunderstanding of the extent of the legal definition of copying and the purpose of copyright. An AI generated piece created using copyright protected data would therefore infringement copyright under section 18, 20 and 21 of the UK Copyright, Designs and Patents Act 1988 (copying, communication to the public and adaptation).

 

In my PhD research and book Law, Technology and Cognition: The Human Element in Online Copyright Infringement,[3] I conducted thorough research into the purpose and application of the restricted acts of copying. This research demonstrates that copyright law aims to be technologically neutral in order to give the legislation endurance and meaning to the principle of copyright. A useful parallel is a case where it was found to be infringement when a website user interface looked the same to users, even though the code behind the websites was different.[4] Ultimately, it matters not how a copy is made, it is simply enough that a protected work is used without permission. Copyright is not about regulating technology, it is about encouraging creativity and therefore it regulates copying, in any material form. Therefore, the use of copyright protected materials by LLMs, without permission, is an infringement of copyright, under both core copyright principles and UK copyright law. It is recommended that the Committee request that the Government confirm that copyright applies to LLMs, which will bring certainty for both AI firms and copyright holders. 

 

2.              A copyright exception does not and should not apply.

 

The fact that an extension to the copyright exception system for AI was being considered by the Government does infer that they understood that without an exception the use would constitute infringement. Nevertheless, a copyright exception can only apply in special circumstances that does not conflict with the rightsholders normal exploitation of the work. Therefore, under copyright rules, an exception is not appropriate for AI because it does not qualify as a special circumstance, it undermines the normal exploitation of licensing, and the output is prejudicial to the interests of the rightsholder since it competes with the original work.[5]

 

An ‘any-purpose’ copyright exception would therefore likely be contrary to international agreements. This is because the Berne Convention ‘three-step test’ requires exceptions to be for a specific purpose and to not harm the copyright-holders’ interests.[6] It would also divide and polarise sectors, instead of encouraging collaborative innovation. An exception would deprive the creative industries of the remuneration and control over their works that copyright entitles them to.

 

Aside from the legal restraints, there is insufficient evidence that an exception is needed by AI firms. Only 13 out of 88 responses to a UK IPO consultation were in favour of a broad any-purpose text and data mining exception, and these largely came from researchers, libraries and archive institutions. Copyright compliant LLMs such as Musiio and Generative AI by Getty Images powered by NVIDIA demonstrate that licensed LLMs are possible and plausible.

 

Although the Government have now appeared to back-track on this proposal and the UK IPO are conducting stakeholder working groups on licensing structures for AI, the Committee should seek confirmation from the Government that the exception has indeed been ruled out. Furthermore, the Government could set out a principle that regulation should support AI and other industries to work together. In fact, AI firms and creative industries are not mutually exclusive, something that needs to be recoginsed by Government in their priorities for encouraging innovation. The creative industries are utilising AI and AI firms can potentially own copyright in their legal computer program. Copyright balances interests to encourage the creation and dissemination of culture and Knowledge, which AI firms and creative industries do best in collaboration.

 

3.              AI generated works should not be protected by copyright as they do not meet the legal requirements or satisfy the justification for copyright protection.

 

AI generated writings, music, dramatic and artistic works are unoriginal under copyright rules and therefore not protectable by copyright. Under UK Law, copyright is a property right which subsists in original literary, dramatic, musical or artistic works. Originality in copyright means that a creator used their own ‘intellectual creativity’ and made ‘free and creative choices’ that carries the ‘personal touch’ of the creator.[7] Needless to say, ‘personal touch’ requires, of course, a person. UK copyright law states that the author of a work is the person that created it.[8]

 

AI generated sound recordings and films are also not protectable by copyright because they are created using a copy of previous sound recordings and films. Copyright protects sound recordings and films which do not require the same originality test (because by their nature they are already a copy of a musical or dramatic work), however copyright does not subsist in a sound recording which is, or to the extent that it is, a copy taken from a previous sound recording. Therefore, any AI generated sound recording, or film cannot be protected by copyright because it is used purely from copies of previous sound recordings and films. By way of international reference, this also applies elsewhere. For example, on 21st February 2023, the United States Copyright Office rejected a copyright registration for an AI generated work stating that since the images “were generated by the Midjourney technology it is “not the product of human authorship.[9]

 

A literary work includes a database and a computer program[10] and therefore whilst the output of the LLM is not protectable, the program itself may be. UK copyright law also includes a provision which provides protection for computer-generated works, which last for 50 years.[11] However, this section is problematic because it only applies to literary, artistic, dramatic and musical works where there is no human author.[12] And, as explained above, these types of works require originality[13] to subsist, and originality requires a human creator. Therefore, it is impossible for the threshold of copyright subsistence to be met for an original literary, dramatic, musical, or artistic work that is generated by a computer.

 

4.              Works created by humans using AI as a tool could be protected only to the extend of the original human creativity.

 

Where a work[14] is created by a human using AI as a tool to assist in their creativity, that work could be protected by copyright to the extent of the human originality. Using AI as a tool for creativity, is no different to using any instrument be it a guitar, a pen or a camera.[15] 

 

5.              The Government should introduce new rights to protect against the use of a person’s face and voice without permission by AI

 

AI tools can create false information, as well as deepfake videos of a person’s appearance and voice. This is a threat to creators and the public. AI deepfakes are deeply problematic, particular relating to pornographic content; a 2019 study found that there were 14,678 deepfakes online and 96% of them were pornographic in nature.[16] Currently, under UK law there is no such thing as an image right or personality right and therefore if a person’s voice or image is copied or used without their consent there is no legal mechanism for them to protect themselves. Previously, before AI, the Government suggested that there are other legislative provisions for people to protect their image such as intellectual property or privacy law.[17] However, in my article Forced Faming: How the Law Can Protect Against Non-Consensual Recording and Distributing of a Person’s Voice or Image, I explored in detail each type of legal right that may apply in this situation (privacy, defamation, sexual offences and copyright infringement) and my research found that the law is falling short in protecting people in the context of AI.[18]

 

None of the available rights protect performers, or indeed the public, from AI or deepfakes. Therefore, it is recommended that the Government consider introducing new rights that capture a person’s voice and image. Many countries outside the UK provide image, personality rights regulation and it is time for the UK to catch up.

 

When developing such rights, the Government needs to remember that copyright exists to balance and protect the interests of both the creators and rightsholders, which do not always align. Both creators (e.g., musicians, writers, performers) and rightsholders (e.g., record labels, publishers, streaming services) may seek revenue from AI uses of their works. Rightsholders tend to have more resources than creators to advocate for themselves, copyright has mechanisms for recognising this imbalance in bargaining power such as unwaivable rights or equitable remuneration for example. These options and their impact need to be explored by the Government to address these issues.

 

 

7 December 2023

6


 


[1]              Copyright, Designs and Patents Act 1988, section 16. (Hereafter CDPA 1988).

[2]              CDPA 1988, section 28A.

[3]              Hayleigh Bosher, Law, Technology and Cognition: The Human Element in Online Copyright Infringement (Routledge, 2020).

[4]              Navitaire Inc v EasyJet Airline Company Ltd, Bullet Proof Technologies Inc [2003] EWHC 3487 (Ch): Mr Justice Pumfrey acknowledged that the coding of the programs was different, but that the outcome on the screen, from the user’s perspective, appeared the same. He explained that it was the same as taking the plot of a book “in the same way that copyright subsisting in a literary work may be infringed by a change in medium in which all that is taken is the plot, so also, it is said, may the copyright in computer software be infringed when the functional structure of the code is appropriated by writing different code which, put crudely, works in the same way.” at 7.

[5]              See also Hayleigh Bosher, Copyright & generative AI: regulating data mining (Policy Brief, Brunel University).

[6]              Berne Convention for the Protection of Literary and Artistic Works in 1967, Article 9 states that: (1) Authors of literary and artistic works protected by this Convention shall have the exclusive right of authorizing the reproduction of these works, in any manner or form. (2) It shall be a matter for legislation in the countries of the Union to permit the reproduction of such works in certain special cases, provided that such reproduction does not conflict with a normal exploitation of the work and does not unreasonably prejudice the legitimate interests of the author.

[7]              Originality in a work ‘is something on which the author has stamped his "personal touch" through the creative choices he has made’ (SAS, para 41). Hence, ‘[t]he requirement of originality under the [CDPA] is that the work must be an expression of that author's own intellectual creation’ (Banner, para 26).” Eleanora Rosati, Originality in copyright law (The IPKat, 2023) https://ipkitten.blogspot.com/2023/11/originality-in-copyright-lawobjective.html?m=1#:~:text=The%20result%20was%20that%2C%20to,more%20than%20negligible%20or%20trivial

[8]              CDPA 1988, section 9. It is internationally recognised that copyright protects only the products of the human mind, see US case Feist v Rural, 499 U.S. 340 (1991), EU Case C-5/08, Infopaq (2009), India case Eastern Book v Modak (2008) 1 SCC 1. The Software Directive and the Database Directive each state that the author is “the natural person, or group of natural persons who created…” Art 2 and Art 4 respectively.

[9]              Letter from U.S. Copyright Office to Kristina Kashtanova (21 Feb 2023) with reference to registration number VAu001480196. Reiterating that the term original requires independent creation and sufficient creativity, pointing to legal cases which stated human creativity must have occurred for a work to be copyrightable and examples of examples of works lacking human authorship such as a photograph taken by a monkey and an application for a song naming the Holy Spirit as the author of the work.

[10]              CDPA 1988, section 3.

[11]              CDPA 1988, section 9(3).

[12]              CDPA 1988, section 178.

[13]              CDPA 1988, section 1(1)(a).

[14]              Work meaning a type of copyright protectable work i.e., literary, dramatic, musical or artistic work.

[15]              This has been recognised in a UK case the computer was no more than a tool…It is unrealistic as it would be to suggest that, if you write your work using a pen, the pen is the author of the work.” Express Newspapers Plc v Liverpool Daily Post & Echo Plc [1985] 1 WLE 1089 at 1093. And “the mechanical process” of a camera does not preclude originality Temple Island Collections Ltd v New English Teas Lts [2012] EWPCC 1 at 21.

[16]              Henry Ajder, Giorgio Patrini, Francesco Cavalli, and Laurence Cullen, The State of Deepfakes: Landscape, Threats, and Impact (Deeptrace, 2019) 1.

[17]              https://www.gov.uk/hmrc-internal-manuals/capital-gains-manual/cg68450

[18]              Hayleigh Bosher, Forced Faming: How the Law Can Protect Against Non-Consensual Recording and Distributing of a Person’s Voice or Image (2023) 28(3) Communication Law, 119-125.