Reaction: UK scraps radical AI mining proposals
Artistic licence: can copyright rein in generative AI?
rafapress / Shutterstock.com
In the wake of Getty Images v Stability AI, Joel Smith and Penny Thornton from Hogan Lovells ask whether UK law is adequate to protect rights holders from unlicensed use of their content.
Artificial intelligence (AI) generators are developing at a startling pace. The product of some AI generators is so good it is hard to distinguish them from genuine work.
Recently, the organisers of a worldwide photography competition awarded the top prize to an AI-generated work. An AI song by Drake and The Weeknd, purporting to be authentic, was shared on multiple platforms and viewed millions of times before it was taken down by the artists’ label.
Such AI works could not be produced without the AI model being trained on very large datasets of copyright-protected images, content and music that has been scraped from the internet. Although some developers are seeking licences from rights holders, many are not.
This has prompted a global debate around how governments should strike the balance between encouraging innovation and compensating rights holders for the use of their works.
So what is the law in the UK in this area and is it adequate to protect rights holders from unlicensed use of content?
Any reproduction of a copyrighted work on UK servers, including reproducing that work in a dataset of content that has been scraped from the internet, will potentially infringe copyright—unless an exception applies.
However, since the datasets storing copyrighted works are generally hidden in the background of an AI product or service, the use of specific works in such datasets often goes undetected or it is difficult for rights holders to prove that their individual works have been relied on to produce the output from the AI.
Earlier this year, in the first case of its kind in the UK, Getty Images brought a claim for copyright infringement against Stability AI, the developer of an AI art generator, for the unlicensed use of Getty images in the underlying dataset used to train the AI model. A similar claim has been brought in the US.
While the UK particulars are not yet public, the US claim includes multiple examples of images generated by the AI model which are clearly based on recognisable images from the Getty website, and which include the Getty watermark.
This could make it much easier for Getty to prove that its copyright works are included in the dataset. Therefore, the focus in the Getty Images v Stability AI dispute will be the extent to which any exceptions to infringement apply.
Text and data mining exceptions
Unlike the US, which has a broadly applicable “fair use” exception (which AI developers are arguing should apply to their use of content scraped from the internet) the UK has a defined list of specific exceptions to infringement. That list currently includes an exception to copyright infringement for text and data mining (TDM), although the exception is limited to acts for the purposes of non-commercial research only.
Following a series of consultations, the UK government said it planned to broaden the existing exemption to allow TDM for any purpose, in order to help foster AI innovation. However, following parliamentary debate in February this year in which the House of Lords termed the proposal “misguided”, the government confirmed it is withdrawing the current proposals for a broader exception and will consult with the creative industry before any further proposals are put forward.
In March, however, in response to Sir Patrick Vallance’s Pro-Innovation Review of Technologies Report, the government said it will produce a Code of Practice, by the summer, to support AI firms to access copyright works. The involvement of both sides will be needed, it says, and legislation may follow if there is no agreement.
Any dataset of images scraped from the internet also potentially infringes UK database rights, in addition to copyright in the images. Under the EU Database Directive, two sets of rights subsist in UK databases: copyright protects the structure of a database and the sui generis database right protects the contents of a database (although databases created after 31 December 2020 will only be protected by a UK database right).
The owner of a database right can prevent third parties from extracting and re-using the whole or a substantial part of the contents of a database. Repeated and systematic extraction and re-utilisation of insubstantial parts of a database over time can also amount to use of a substantial part of a database and therefore infringe.
Even where the use of scraped images is internal and the images are not surfaced, the reproduction of a substantial part of a database, such as the Getty Images database, could amount to infringement. Again, it is usually difficult for database rights owners to prove that a substantial part of a database has been extracted and reused, where the images are not surfaced, unless the content is watermarked in some way (as it is for Getty’s images, for example).
Striking the right balance
The growth of AI is happening at a dramatic pace. Despite rights holders such as Getty Images offering licences specifically for the purposes of developing AI and machine learning tools, many developers are proceeding without licences or permission from rights holders, and waiting to see which jurisdictions are most problematic.
Some AI firms argue that the sheer volume of copyrighted works included in the datasets, which can run into millions, and the difficulty of identifying which specific works from a dataset have been used by the AI in a particular output, means it is impractical to obtain licences.
Rights holders, on the other hand, point to the fact that they are already successfully licensing some AI firms and that proceeding without a licence is infringing copyright and damages the creative industries, which provide the source material on which the AI relies.
The government’s proposal for a Code of Conduct would seem like a sensible solution. However, given rights holders’ experience of years of failed negotiations with internet service providers (ISPs) to agree a set of rules to deal with infringing content hosted on their platforms, they may be putting more faith in getting a swift, clarificatory judgment in the Getty Images v Stability AI dispute.
Certainly all eyes will be on that test case in the coming months.
Joel Smith is a partner at Hogan Lovells, and can be contacted at: email@example.com
Penny Thornton is counsel at Hogan Lovells, and can be contacted at: firstname.lastname@example.org
Today’s top stories
Ed Sheeran on trial: fair inspiration or blatant plagiarism?
artificial intelligence, AI, AI generators, generative AI, copyright, Getty Images, Stability AI