AI and Fair Use at a Crossroads
On June 25th, federal district court Judge Vince Chhabria issued his ruling in Kadrey v. Meta Platforms, Inc. In this case, a number of writers—some quite famous—sued Meta for copyright infringement based upon Meta’s use of their works to train its GenAI. Faced with competing motions for summary judgment from the parties, Judge Chhabria did a careful analysis of the fair-use doctrine as it applies to GenAI. His opinion, which is likely to be very influential, opens as follows:
Companies are presently racing to develop generative artificial intelligence models—software products that are capable of generating text, images, videos, or sound based on materials they've previously been “trained” on. Because the performance of a generative AI model depends on the amount and quality of data it absorbs as part of its training, companies have been unable to resist the temptation to feed copyright-protected materials into their models—without getting permission from the copyright holders or paying them for the right to use their works for this purpose. This case presents the question whether such conduct is illegal.
Although the devil is in the details, in most cases the answer will likely be yes. What copyright law cares about, above all else, is preserving the incentive for human beings to create artistic and scientific works. Therefore, it is generally illegal to copy protected works without permission. And the doctrine of “fair use,” which provides a defense to certain claims of copyright infringement, typically doesn't apply to copying that will significantly diminish the ability of copyright holders to make money from their works (thus significantly diminishing the incentive to create in the future). Generative AI has the potential to flood the market with endless amounts of images, songs, articles, books, and more. People can prompt generative AI models to produce these outputs using a tiny fraction of the time and creativity that would otherwise be required. So by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way.
Meta defended itself with the only argument realistically available to it, namely, the fair-use exception to copyright infringement. A key factor in this crucial defense is the contention that the defendant’s use of the plaintiff’s creative work is so transformative that it doesn’t impact the plaintiff’s interests… that, in fact, it’s in effect a new creation in its own right. Citing to a decision issued just two days prior by a colleague on the federal bench in San Francisco, he asks if the transformative nature of GenAI moots the issue of potential negative market impact.
Some students of copyright law respond that none of this [the “market impact” argument] matters because when companies use copyrighted works to train generative AI models, they are using the works in a way that's highly creative in its own right. In the language of copyright law, the companies’ use of the works is “transformative.” As a factual matter, there's no disputing that. And as a legal matter, it's true that you're less likely to be liable for copyright infringement if you're copying the work for a transformative purpose. In that situation, you're more likely to be protected by the fair use doctrine. But as the Supreme Court has emphasized, the fair use inquiry is highly fact dependent, and there are few bright-line rules. There is certainly no rule that when your use of a protected work is “transformative,” this automatically inoculates you from a claim of copyright infringement. And here, copying the protected works, however transformative, involves the creation of a product with the ability to severely harm the market for the works being copied, and thus severely undermine the incentive for human beings to create. Under the fair use doctrine, harm to the market for the copyrighted work is more important than the purpose for which the copies are made.
Speaking of which, in a recent ruling on this topic, Judge Alsup focused heavily on the transformative nature of generative AI while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on. Such harm would be no different, he reasoned, than the harm caused by using the works for “training schoolchildren to write well,” which could “result in an explosion of competing works.” Order on Fair Use at 28, Bartz v. Anthropic PBC, No. 24-cv-5417 (N.D. Cal. June 23, 2025), . No. 231. According to Judge Alsup, this “is not the kind of competitive or creative displacement that concerns the Copyright Act.” Id. But when it comes to market effects, using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take. This inapt analogy is not a basis for blowing off the most important factor in the fair use analysis.
His Honor carefully parses the fair-use doctrine, which uses a four-part test. Two of the test’s prongs he finds to be irrelevant to his inquiry. We can ignore them here, too. Only the transformative nature of training an LLM on copyrighter material and its impact on the market are worthy of our attention.
The plaintiffs’ lawyers weren’t ignorant or dumb. They made market-impact argument. One was that Meta’s LLM could reproduce excerpts of the plaintiffs’ worker verbatim. The other was that the use by Meta of unlicensed versions of the plaintiffs’ works forestalled plaintiffs opportunities to license their works to Meta and other LLM developers. Judge Chhabria was unimpressed by these two arguments.
What the judge did like was an argument that he brought forward himself.
The third way that using copyrighted books to train an LLM might harm the market for those works is by helping to enable the rapid generation of countless works that compete with the originals, even if those works aren't themselves infringing. Assume for this discussion that people can (or will soon be able to) use LLMs to generate massive amounts of text in significantly less time than it would take to write that text, and using a fraction of the creativity. People could thus use LLMs to create books and then sell them, competing with books written by human authors for sales and attention. Indeed, to some extent, this appears to be occurring already—one expert for the plaintiffs briefly discusses reports of AI-generated books “flooding Amazon.” Pls. MSJ Ex. 76 ¶ 199; see id. ¶¶ 193–207. People might even be motivated to make those books available for free, given how easily it will presumably be to prompt an LLM to create them. Harm from this form of competition is the harm of market dilution. Or as one commentator describes it, the harm of “indirect” substitution, rather than “direct” substitution (which would be the first form of harm described). See Matthew Sag, Fairness and Fair Use in Generative AI, 92 Fordham L. Rev. 1887, 1916–20 (2024).
Judge Chhabria nailed it in my opinion. This, I think, is exactly where the real threat lies. And, as he suggests, the threat has moved beyond theory to fact. More than two years ago, the Independent published this allegation:
Hundreds of books written by ChatGPT have appeared on Amazon in recent weeks as people look to cash in on generative artificial intelligence.
Close to 300 books written or co-written by OpenAI’s AI software were listed on the online retailer on Wednesday, 22 February [2023], ranging from fantasy fiction to self-help and non-fiction.
One can reasonably argue that Amazon/Kindle is a mixed blessing to aspiring authors. On the one hand, it’s a way to get a book “out there” in a publishing industry increasingly dominated (like so much else in the arts) by a few mega-players. Full disclosure: Of my 25 published books, three were self-published on Amazon. Two are still available there, The Four Pillars of Organizational Resilience and The Ice Cream Man and the Elephant Man. They are emblematic of what I deem legitimate uses of Amazon/Kindle. The former was used by me and my co-author to promote our business enterprises; sending copies to prospective clients directly from Amazon was an affordable and efficient way to get their attention. The latter is a novella. I’m the first to admit I’m no fiction writer. This is a labor of love that I wanted to see in print. It make a nice gift to tolerant friends.
The use (or shall I say “abuse”?) reported by the Independent obviously has no legitimate place in the serious writers’ world. Amazon agreed… but apparently didn’t want to lose this new business. The retail giant through legit authors a bone: no one would be allowed to publish more than three books in a single day on Kindle Direct Publishing. This gave very little comfort to Maria Arana, the author of LatinoLand: A Portrait of America's Largest and Least Understood Minority.
The day after its release, she went on Amazon to see how it was doing. "Right below the cover of my book was another cover," Arana says. "The cover said 'America's Largest and Least Understood Minority. A Summary of Latinoland.'"
Arana sent NPR a photo of the search result on Amazon. The book says it was written by Clara Bailey. A review of Bailey's work showed that Bailey had published a number of these so-called summaries and put them up for sale on Amazon. NPR asked an Amazon spokesperson about Bailey but did not receive a related response. And the company did not offer anyone up for an interview when asked, generally, about AI-generated books. Since NPR's inquiry, Bailey's books have been removed from Amazon. Bailey's publishing history still appears on Goodreads, which is owned by Amazon.
So, where are we today? Judge Chhabria opened a door. It’ll be interesting to see if the plaintiffs in his case accept his invitation and come back with an amended complaint that follows his guidance. I say “interesting.” I’m not saying “encouraging.” I see evidentiary issues that may be insurmountable. I even see standing issues. If the established plaintiff/authors by virtue of their status are relatively immune to His Honor’s “dilution” theory, how can they come into court and represent we peasants of the publishing world whose maiden efforts, such as my novella, may be seeing our maiden efforts swamped in a sea of AI knock-offs?
Truth to be told, Amazon and Kindle Director carried a lot of crap long before ChatGTP burst on the scene in November 2022. This was human-made publishing poop. Heck, you may examine my humble novella and classify it accordingly. Or you may be one of the many, myself included, who hesitate to buy a self-published work, having been burned in previously in purchasing badly written and/or edited works. So, while we may come to this issue with the highest level of resolve (i.e., not to acquire any AI-generated or assisted works), how do we separate the wheat (legit works by real human authors) from the chafe? It may be obvious once we crack the cover. But then it may be too late. The cash register has already been rung.
Fellow authors, the hard truth is that neither Judge Chhabria’s dilution theory nor our own resolve to shun AI-authored work seems to me to offer a solution to our dilemma.
Is there then no hope?
Stay tuned…
Header image via Unsplash.