ai-data-set
27 September 2023CopyrightSarah Speight

Westlaw owner's dispute with AI startup over legal data moves to jury trial

Case centres on legal research platform Westlaw and a 'rival' | Judge asks “is it in public benefit to allow AI to be trained with copyrighted material?” | Case of “existential importance” for the parties | Comment from Cooley, Finnegan and Dickinson Wright.

A copyright dispute between media conglomerate  Thomson Reuters and an AI legal services startup has moved to jury trial, becoming one of the first related to the use of data to train generative AI systems.

Thomson Reuters, the parent company of Reuters News, sued  Ross Intelligence in 2020, alleging that Ross Intelligence stole content from its legal-research platform  Westlaw to train an AI-based rival.

Both parties had moved for summary judgment, but in a decision filed on Monday, September 25 in a Delaware federal court, US Circuit Judge  Stephanos Bibas denied this, saying a jury must decide the outcome.

“Facts can be messy even when parties wish they were not,” said Bibas. “But summary judgment is proper only if factual messes have been tidied. Courts cannot clean them up.”

A Q&A system

Thomson Reuters accused Ross of copying Westlaw’s ‘headnotes’, short summaries of points of law in court opinions, and using them to train Ross’ AI-powered search engine.

Ross, which said it had to  shut down its platform in January 2021 due to spiralling costs over the “spurious" litigation, argued fair use.

As detailed in the decision, Ross “sought to create a ‘natural language search engine’ using machine learning and artificial intelligence. It wanted to ‘avoid human intermediated materials’.

“Users would enter questions and its search engine would spit out quotations from judicial opinions—no commentary necessary.”

Bibas noted, however, that Thomson Reuters had refused Ross a licence to use copyrighted

Westlaw data, and Ross instead turned to a third-party legal research firm, LegalEase.

Upon Ross’ direction, LegalEase created memos with legal questions and answers, both manually and partly with the help of a text-scraping bot, said the court.

The questions were intended to be those “that a lawyer would ask”, and the answers were direct quotations from legal opinions, Biba explained.

This ‘Bulk Memo Project’, from which the core of the dispute stems, said Biba, produced about 25,000 question-and-answer sets.

Thomson Reuters argues that the questions were “essentially headnotes with question marks at the end”. Ross “admits that the headnotes ‘influence[d]’ the questions but says lawyers ultimately drafted them, instead of copying them.”

Public interest dilemma

Bibas said he “cannot yet determine” what would best serve the public interest.

“Deciding whether the public’s interest is better served by protecting a creator or a copier is perilous, and an uncomfortable position for a court,” he wrote.

“Copyright tries to encourage creative expression by protecting both. Here, we run into a hotly debated question: Is it in the public benefit to allow AI to be trained with copyrighted material?”

But in a statement emailed to WIPR, Thomson Reuters said: “We sought summary judgment on select issues because we believe the facts of the case are clear-cut.

“We look forward to presenting the evidence to a jury.”

A case of ‘existential importance’

Angela Dunning, a partner at Cooley, believes that to decide the core issues of copyright infringement and fair use, a jury trial is “the right move for this case, given the unique, disputed facts”.

But, she says, “other courts have appropriately found non-infringement and/or fair use at the summary judgment (or pleadings) stage, where the undisputed facts showed lack of substantial similarity of protected expression or a transformative use.

“Accordingly, the significance of this decision is limited to its facts and specific context.”

Daniel Mello, an associate at  Finnegan, similarly observed that broad denials of cross-motions for summary judgment, paired with the uncertainty of a jury trial, “usually provides fertile ground for parties to sow a settlement”.

“Nevertheless, given the existential importance that a victory in this dispute carries for both parties, I would not be surprised if the case were to continue to trial,” he told WIPR.

“A victory for Thomson Reuters would fortify a copyright owner’s rights as against AI companies that are relying on fair use to collect training data. And a victory for Ross would do the opposite.”

Implications for generative AI

The dispute’s outcome could be pivotal for the current debate that is raging over the issue of copyright and generative AI, and comes as a series of complaints are being levelled at platforms such as OpenAI by creators.

Similarly, Thomson Reuters v Ross Intelligence could have implications for other cases challenging generative AI because it addresses how the ‘copyrighted works’ are being used, said  William Honaker, a member at Dickinson Wright.

“I found the court’s discussion of fair use and, in particular, transformative intermediate copying, the most interesting,” he added.

Honaker noted the court’s statement that: “It was transformative intermediate copying if Ross’ AI only studied the language patterns in the headnotes to learn how to produce judicial opinion quotes.

“But if Thomson Reuters is right that Ross used the untransformed text of headnotes to get its AI to replicate and reproduce the creative drafting done by Westlaw’s attorney-editors, then Ross’s comparisons to cases like Sega and Sony are not apt.”

‘Significant’ legal obstacles

Mello added that if the case proceeds to trial, “each party faces a significant legal obstacle”.

One obstacle that Thomson Reuters must overcome is a threshold ownership issue, he explained.

“Despite offering evidence of registered compilations of Westlaw, which contained headnotes and key numbers, the court refused to determine whether Westlaw’s Key Number System or headnotes are original enough to warrant copyright protection.

“To be certain, the bar is very low, requiring only a modicum of creativity under  Feist.”

Mello highlights, for example, Ross’ assertion that the Key Number System is “unoriginal because most of the organisation decisions are made by a rote computer program and the high-level topics largely track ‘common doctrinal topics taught as law school courses’.”

The decision was handed down in the US District Court for the District of Delaware.

Counsel for Thomson Reuters are Jack Blumenfeld, Michael Flynn of  Morris, Nichols, Arsht & Tunnell; and Dale Cendali, Eric Loverro and Joshua Simmons at  Kirkland & Ellis.

Counsel for Ross Intelligence are David Moore, Bindu Palapura and Andrew Brown at  Potter Anderson & Corroon.

Also for Ross Intelligence: Gabriel Ramsey, Warrington Parker, Joachim Steinberg, Jacob Canter, Christopher Banks, Shira Liu, Margaux Poueymirou, Anna Saber, at  Crowell & Moring’s San Francisco office, and Mark Klapow, Lisa Kimmel and Crinesha Berry in its Washington, DC office.

Already registered?

Login to your account

To request a FREE 2-week trial subscription, please signup.
NOTE - this can take up to 48hrs to be approved.

Two Weeks Free Trial

For multi-user price options, or to check if your company has an existing subscription that we can add you to for FREE, please email Adrian Tapping at atapping@newtonmedia.co.uk