BackgroundStore / Shutterstock
4 March 2024NewsFuture of IPLee Curtis

AI poisoning tools: A commercial solution to a legal problem?

Free AI ‘poisoning’ tools Nightshade and Glaze disrupt AI bots during the data training process. But are these really a solution to protecting copyright, or should governments step in? Lee Curtis of HGF explores.

The UK government, and indeed all governments, face a dilemma regarding the development of the AI industry and its perceived inherent conflict with the IP rights of creatives.

Does the way that AI applications learn place them in direct conflict with IP laws, most notably copyrights laws?

All governments are considering this thorny dilemma. Should the conflict be settled via regulation, most notably exemptions to copyright law, via licensing or indeed via the courts?

Or is there another option, most notably the development and widespread use of so-called AI poisoning tools, which could be provided either on a proprietary or open source basis?

The copyright dilemma

In submissions to the House of Lords Communications and Digital Select Committee on December 5, 2023, OpenAI, the creator of ChatGPT, stated that it could not adequately train the large language models (LLMs) behind many AI applications without access to copyright materials.

OpenAI stated that “Because copyright today covers virtually every sort of human expression—including blog posts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyright materials.”

It went on to say: “Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”

To make the analogy with the ‘human’ world, modern art might have been different if a David Hockney or a Damien Hurst had been limited to being aware of, and being trained in, Renaissance masterpieces than works of Picasso.

AI developers are often keen to make the analogy with how human students learn and how copyright law was adapted to allow for such private study. Why should AI applications be treated any differently?

They claim that their use of copyright materials are fair and transformative and are not copying in the purest sense, not unlike a fashion designer or an artist being inspired by the fashion works of a bygone age to create a new work.

However, many argue that how AI applications learn is simply copying, which goes to the core of copyright law.

The House of Lords Communications and Digital Select Committee made it clear in its subsequent report, published in February 2024, that the government cannot simply sit on its hands; it has to find a way to square the copyright dilemma. The government cannot simply wait for the UK courts to interpret these complex matters.

There is the need to allow AI applications to develop, but also to respect the IP rights of creatives. But how can we reach this desired goal?

The legal options

In 2022, the UK government seemed to be proposing a general exemption from copyright law that enabled LLMs to learn via copyright works. At present, the UK’s text and data mining exemption is relatively limited, generally to computational analysis and for the purposes of research of a non-commercial purpose.

However in early 2023, the then IP minister George Freeman stated that the UK Government would not be proceeding with this general exemption. Instead, it stated it would proceed with a consultation between creative IP rights holders and representatives of AI application providers to agree a code of practice in this area.

However, early this year the government announced that its efforts to formulate an AI code of conduct via the consultations at the UK Intellectual Property Office had failed. The government has now resorted to a so-called ‘period of engagement’.

While the government reflects and engages, the wheels of the courts move forward. In December 2023, the UK High Court allowed a case by Getty Images to proceed against Stability AI regarding the alleged use of images managed by Getty to train Stability AI applications.

In a case which deals with broader issues of the conflict between copyright and trademark rights and AI applications, late last year The New York Times launched a case in the US against OpenAI and Microsoft, OpenAi’s largest shareholder.

It seems likely that these cases will either absolve AI providers of infringing copyright rights, or more likely herald a regime of licensing of IP rights to enable AI applications to train. Indeed, Reddit recently struck a deal with Google to make its content available for training.

However, some have argued that a licensing regime would simply ingrain the advantage of big tech into the world of AI applications, being the companies most well placed to pay licensing fees. But, while licensing regimes similar to the Performing Rights Society in the UK could come into play, this might be administratively burdensome—especially for smaller creatives.

So is there an alternative solution?

AI poisoning tools

Recently a potential alternative to the legal options have come to the fore and these are so-called AI poisoning tools.

Two such applications of note have been produced by researchers at the University of Chicago. The applications take two forms: the ‘defensive’ Glazeapplication, which prevents AI applications learning an artist’s signature ‘style’; and the ‘offensive’ Nightshadeapplication. A detailed description of how Nightshade works can be found in a paper produced by the researchers.

In essence, Nightshade ‘poisons’ generative AI image models by altering works posted on the web or shades them at the pixel level. As the paper from the researchers explains, AI models such as Stable Diffusion are trained on large datasets of between 500 million and 5 billion images.

The problem for AI poisoning tools was that it was thought that given the size of large datasets, between 5% and 20% of the images had to be poisoned in some way to impact the reliability of the AI tool, which would make them economically unfeasible.

However, with Nightshade the user can target certain concepts by using specific keywords in any prompt sequence for it to have an impact. For example, images of a dog in a training set can be poisoned to appear as a cat, not only impacting the AI tool’s interpretation of all dog images but related images, such as ‘husky’' or ‘puppy’.

The attractiveness of these tools can be illustrated by the fact that Nightshade was downloaded 250,000 times in the five days following its release in January 2024. Glaze has released 2.2 million downloads since its release in April 2023.

The University of Chicago researchers plan to release a tool which combines both Glaze and Nightshade and ultimately make the tool open source.

Attractions of AI poisoning tools

Currently, IP rights holders can opt out of their content and websites being crawled via the so-called robots.txt programs—and large AI providers claim to respect such opt-outs.

However, these opt-outs are often not respected as they are optional, and can be easily bypassed. The attractiveness of an AI poisoning tool to a content creator is that it could essentially disable the AI training application.

As the University of Chicago researchers postulate, Nightshade and similar applications could provide a powerful incentive for AI training applications to respect ‘do not crawl’ directives.

Further, they go on to state that there could be co-ordination between copyright owners. For example, Disney could apply Nightshade to its images of ‘Cinderella’ while coordinating with others to poison concepts such as ‘Mermaid’.

Although a technological war could erupt between AI providers and the providers of AI poisoning tools, where an arms race develops with AI training tools anticipating and bypassing the poisoning tools, here the law may again come into play.

If AI training tools were deliberately attempting to circumvent AI poisoning tools, which could be argued are protecting copyrighted works from being copied, one could argue that AI applications are clearly infringing copyright, and could be potentially subject to both civil and criminal provisions protecting copyright works.

Although it is possible that AI poisoning tools might only provide a temporary respite for IP rights holders, that may provide enough time for governments, AI providers and indeed creatives to apply a long term solution to the copyright dilemma—a dilemma that, with all respect, they have grappled with for too long as technology progresses.

Lee Curtis is a partner and chartered trademark attorney in the Manchester Office of HGF.

Already registered?

Login to your account

To request a FREE 2-week trial subscription, please signup.
NOTE - this can take up to 48hrs to be approved.

Two Weeks Free Trial

For multi-user price options, or to check if your company has an existing subscription that we can add you to for FREE, please email Adrian Tapping at atapping@newtonmedia.co.uk


More on this story

AI
7 February 2024   Kathi Vidal emphasises importance of responsible AI use and directing judicial boards to adhere to existing rules and policies | USPTO plans to release further guidance and seek public input.
Copyright Channel
3 November 2023   Members of the UK’s Second Chamber propose putting creators first, share admiration for the EU’s AI Act, and want greater ambition over supercomputer plans. Marisa Woutersen hears more.