shutterstock_2256953695_moor_studio
9 October 2023FeaturesCopyright ChannelSarah Speight

Rewriting the rules: do we need a new kind of licence to solve genAI’s author problem?

Since its launch almost one year ago, ChatGPT has been an unexpected, runaway success.

It set a record for the fastest user growth within just two months after its release in November 2022, reaching 100 million active users by the following January. It now has more than 180 million users to date and a projected revenue of $1 billion in 2024.

But this success has come at a cost.

Its owner OpenAI has faced a raft of lawsuits by authors, who claim it has used their copyrighted works without permission to train the large language models (LLMs) that power ChatGPT.

In the most recent dispute, The Authors Guild and 17 of its members, including novelist and former lawyer and politician John Grisham (author of The Pelican Brief), claim that ChatGPT has produced “accurately generated summaries” of their works when prompted.

Jodi Picoult (My Sister’s Keeper), George RR Martin (Game of Thrones), Jonathan Franzen (The Corrections), and David Baldacci (Camel Club) were also among the plaintiffs.

This came hot on the heels of two other lawsuits, also in September, by a group of novelists, playwrights and screenwriters—including Pulitzer Prize winner Michael Chabon (The Amazing Adventures of Kavalier & Clay).

This group of authors sued OpenAI on September 8, and followed with a similar complaint four days later against Meta’s LLaMa, an LLM released in February 2023.

These lawsuits were amended last week, October 7, to add yet more US authors, including journalist and writer Ta-Nehisi Coates, who received the 2015 National Book Award for Nonfiction for Between the World and Me; and Pulitzer Prize winners Junot Diaz (The Brief Wonderous Life of Oscar Wao) and Andrew Sean Greer (Less).

And in July, US comedian, actress and writer Sarah Silverman, plus two other authors, sued OpenAI as well as Meta for allegedly copying content from their books.

OpenAI’s mission

But OpenAI’s chief executive, Sam Altman, has spoken of his company’s belief that “creators deserve control over how their creations are used” and that “content creators, content owners, need to benefit from this technology.”

In a testimony before the US Senate in May this year on rules for AI, Altman told Congress during the subcommittee hearing that “OpenAI does not want to replace creators. We want our systems to be used to empower creativity, and to support and augment the essential humanity of artists and creators.”

So what is going on?

Aaron Johnson, a partner at Lewis Roca, believes that we’re facing a similar dilemma to when the internet was first getting started and people didn't really know how to apply existing copyright law to it, he tells WIPR.

“Now, there is this brand new question of, if [creative works] are ingested by an AI, does that infringe? This could have very big implications on how [those works] are used.

“We were all very surprised about ChatGPT, and how proficient it is. But if all of a sudden we say you can't train [the AI datasets] like this anymore, is that really going to kind of affect this world-changing technology?”

'Author-enabled' success

In its complaint, The Authors Guild argued that ChatGPT’s success would not even be possible without the copyrighted works at issue.

Rachel Geman, co-counsel for the plaintiffs, said at the time that without the copyrighted works at issue, OpenAI “would have a vastly different commercial product”.

Additionally, the complainants state that their income and livelihood are at stake.

“Defendants’ decision to copy authors’ works, done without offering any choices or any compensation, threatens the role and livelihood of writers as a whole,” Geman added.

Scott Sholder, a partner at Cowan, DeBaets, Abrahams & Sheppard and also co-counsel for the authors, said: “Plaintiffs don’t object to the development of generative AI, but the defendants had no right to develop their AI technologies with unpermitted use of the authors’ copyrighted works.”

Software looks to enterprise licensing

There have been moves by software companies to license AI-generated content from original creators, or at least indemnify users of that content against litigation.

Enterprise licensing is growing, with the likes of Adobe and M eta providing licensing agreements to business customers for their AI products.

But AI companies licensing original content from individual authors is as yet relatively new territory.

OpenAI and news agency The Associated Press (AP) struck a deal in July 2023 to enable the AI platform to license AP’s archive of news stories.

In turn, “AP will leverage OpenAI’s technology and product expertise,” according to a joint statement.

Indeed, in the Authors Guild complaint, it was noted that OpenAI’s Altman told senators that the company has “licens[ed] content directly from content owners” for “training” purposes.

Scott Sholder pointed out that “defendants could have ‘trained’ their large language models on works in the public domain or paid a reasonable licensing fee to use copyrighted works.”

Licensing for authors—a ‘cumbersome’ option

But would the widespread licensing of original content to train LLMs for generative AI tools really work/be viable?

Jeffrey Cadwell, a partner at Dorsey & Whitney and co-chair of the firm’s Creative Industries and Video Game groups, is sceptical.

“Obtaining individual licences from rights holders could be very cumbersome and potentially prohibitively expensive.

“It is possible there could be a collective licensing model, similar to music public performance licences, but I think creators would be reluctant to sign on to this type of licence, because they would have no control over how their work is being used.”

He makes the analogy with songwriters, who “at least know that the licence covers the performance of their songs”.

“With generative AI, the works can be used in any number of ways, most likely to create derivative works incorporating or based on the content.”

Derivative works

For example, The Authors Guild also noted that businesses and other third parties are emerging, which enable users of tools like ChatGPT and Anthropic’s Claude to create derivative stories from works of fiction, such as Socialdraft.

ChatGPT “is being used to generate low-quality ebooks, impersonating authors, and displacing human-authored books”, the authors claim. The writer Jane Friedman, for instance, discovered “a cache of garbage books” written under her name for sale on Amazon.

However, Cadwell points out that “the public will likely be the arbiter of whether AI-generated books will be successful.

“If the quality is low, then it is unlikely such books would become successful, even if offered at very low price points,” he says.

“With respect to impersonation of authors, we have to remember that copyright law does not protect style. But if an AI-created work is associated with an author, then the author likely has a right of publicity and a false association or even a straight trademark claim.”

Tom Polcyn and Matt Braunel at Thompson Coburn note that among the rights afforded to the owner of a copyright are the right to reproduce the work and the right to prepare derivative works based on the work.

“Both Silverman et al v Meta and OpenAI, and Chabon et al v OpenAI allege improper reproduction, and both suits raise issues concerning the preparation of derivative works by AI systems,” say Polcyn and Braunel.

“However, the Authors Guild complaint is more detailed on the preparation of derivative works and alleges a much broader economic impact, claiming that AI is already impacting human authors’ finances because “content writing…is starting to dry up as a result of generative AI systems…”.

Cadwell adds: “My gut reaction is that AI-generated works that are derivative of existing works should be treated the same way we treat human-generated works. For example, in the fan fiction realm, does the derivative work constitute fair use?

“Of course, if the AI companies can get a court to rule that training an LLM with copyrighted content constitutes some type of fair use, then commercial licensing may not be necessary. We will have to wait to see how the fair use defence claims get resolved.”

The Books3 situation

There is another layer to the debate, though.

The Books3 dataset, which is said to contain more than 180,000 pirated books, was quoted in all of the lawsuits brought by authors this year against OpenAI, ChatGPT and Meta as being one source used to train LLMs. It was also quoted in a recent case in Australia.

Books3 is the “bigger issue”, according to Cadwell. “The fact that the database contains pirated books means the entire database constitutes copyright infringement in the first instance by violating the rightsholder’s exclusive rights of reproduction and distribution.

“If the database is copied in teaching an LLM, then there is another act of infringement.”

“We will have to wait to see how the fair use defence claims get resolved.”

And the fair use consideration, add Polcyn and Braunel, involves an analysis “that is typically fact-intensive”.

“Whether the defendants’ alleged actions actually caused the harm to the plaintiffs will have to be shown via admissible, credible evidence at later stages in the cases.”

Lewis Roca’s Johnson says the question over whether AI will create has been answered. “The issue is how are they even protected?” he asks.

“And so I think that is the next big question, whether or not copyright needs to be changed, or maybe we come up with something brand new for AI works.

“But there are going to be some uncomfortable cases, where we have to start drawing some lines before companies know for sure what is going to be infringing and what is not going to be infringing.”

This article was updated after first publication to add information about the amended class actions against OpenAI and Meta.

Already registered?

Login to your account

To request a FREE 2-week trial subscription, please signup.
NOTE - this can take up to 48hrs to be approved.

Two Weeks Free Trial

For multi-user price options, or to check if your company has an existing subscription that we can add you to for FREE, please email Adrian Tapping at atapping@newtonmedia.co.uk


More on this story

Copyright Channel
19 October 2023   Hollywood writers secured a deal preventing the use of GenAI but why were producers so ready to make the concession? Andy Stroud of Hanson Bridgett digs into the subplot.
Copyright Channel
8 January 2024   Hear all about it! The NYT has presented a compelling case—backed by rich evidence—but a loss at trial could be a disaster for the publishing industry, finds Sarah Speight.
Copyright
15 January 2024   CEO says ‘control and compensation’ equally important in blanket license model | AI companies would pay fees to use works under scheme that authors could opt into.