In my first July Substack article, I reported to you on a brand new decision from the United States District Court for Northern California. The case, Kadrey v. Meta Platforms, Inc. , held that Meta was legally entitled to train its LLM, using pirated copies of the plaintiffs’ books. The legal theory that carried the day for the defendant was the doctrine of fair use. Under U.S. copyright law, a party may make “fair use” of copyrighted material --- even though that party lacks a license from the copyright owner --- if the use meets a four-factor test:
the purpose and character of the use,
the nature of the copyrighted work,
the amount and substantiality of the portion used, and
the effect of the use on the potential market for or value of the copyrighted work
The federal judge found that factors two and three were not substantially implicated in his analysis. He held that the first factor turned on just how transformative Meta’s use of the plaintiffs’ (and many, many other) books was. Not only His Honor, but both parties, apparently agreed that the use made by Meta was very transformative. Score one for the defendant.
Last, but far from least, he looked at factor four. The plaintiffs argued that they had suffered lost licensing opportunities, due to the defendant’s use of pirated versions of their works, and because the LLM could potentially reproduce small portions of their writings verbatim. Suffice to say the judge found neither contention persuasive. He granted summary judgment to Meta…
…but not before making the following observation:
The third way that using copyrighted books to train an LLM might harm the market for those works is by helping to enable the rapid generation of countless works that compete with the originals, even if those works aren't themselves infringing. Assume for this discussion that people can (or will soon be able to) use LLMs to generate massive amounts of text in significantly less time than it would take to write that text, and using a fraction of the creativity. People could thus use LLMs to create books and then sell them, competing with books written by human authors for sales and attention. Indeed, to some extent, this appears to be occurring already—one expert for the plaintiffs briefly discusses reports of AI-generated books “flooding Amazon.” Pls. MSJ Ex. 76 ¶ 199; see id. ¶¶ 193–207. People might even be motivated to make those books available for free, given how easily it will presumably be to prompt an LLM to create them. Harm from this form of competition is the harm of market dilution. Or as one commentator describes it, the harm of “indirect” substitution, rather than “direct” substitution (which would be the first form of harm described). See Matthew Sag, Fairness and Fair Use in Generative AI, 92 Fordham L. Rev. 1887, 1916–20 (2024).
Indeed, evidence has been piling up for years that Amazon/Kindle is in fact inundated with AI-generated works, oftentimes competing directly with the original works they mimic. Witness this 2023 article’s contentions:
Hundreds of books written by ChatGPT have appeared on Amazon in recent weeks as people look to cash in on generative artificial intelligence.
Close to 300 books written or co-written by OpenAI’s AI software were listed on the online retailer on Wednesday, 22 February, ranging from fantasy fiction to self-help and non-fiction.
Another article, this from 2024, asserts:
As we feared—and warned—the growing access to AI is driving a new surge of low-quality sham “books” on Amazon. In the last few weeks, we have seen hundreds of examples of how bad actors are using generative AI to produce “books” that deceive customers and drive sales away from legitimate books. It appears that every new anticipated high-profile book has one or more scam books up within a couple days of going on sale. For its part, Amazon has been quick to take down scam books when it receives complaints, but the fact that they are able to get through Amazon’s content filters in the first place suggests that detecting AI-aided scams is presenting a challenge. …
Based on the complaints we have received, summaries, workbooks, and guides appear to be the category most commonly exploited by the scammers. Low-quality works like these have long been a problem on Amazon because it is easy to piggyback on the success of legitimate titles and sell “companion” books. Fair use case law has found that “companion” books that have a great deal of analysis and commentary and do not take too much from the original work are fair use, so Amazon allows many of them to remain on the site. A few years ago, the Authors Guild, recognizing the impact these books can have on legitimate sales, convinced Amazon to require sellers of summary books to include a conspicuous disclaimer on the listing page and cover, disclosing that the book is a summary or guide and not a substitute for the original work.
The source of these observations is The Authors Guild, an organization that seems to be working hard to help us authors out of our quagmire. The organization has played its part in persuading Amazon to take some worthwhile steps.
We have been in discussions with Amazon about this issue. We understand that they are taking it very seriously and working on solutions to prevent scam books from going up in the first place. The quality of these AI-generated books is, for now, quite poor, so it is in Amazon’s own interest to crack down on them. Amazon has disallowed listings in the companion books category, except with “limited exceptions for guides with positive reader engagement.” We hope and expect that these measures, at least for now, will start to slow the tide of low-quality, AI-generated companion books.
Amazon also limits authors to publishing no more than three works a day on Kindle. But that means an ambitious and determined scammer still can launch more than 1,000 knock-offs per year. Meanwhile, the Guild is lobbying government entities for protective legislation… but new laws take time, and spotty statutory coverage leaves loopholes.
Bottom line, authors and other artists need to attack this potentially-existential challenge at its source, the LLM that has been trained on their creative works. This is where the door opened by Kadrey comes into possible play. But it opens onto a steep hill.
The Concept of “Market Dilution”
The U.S. Copyright Office has signaled that market dilution may be a viable theory available to authors and artists in challenging LLMs’ unlicensed appropriation of their works for training purposes.
The USCO expresses concern over outputs that could act as indirect substitutes for copyrighted work: “If thousands of AI-generated romance novels are put on the market, fewer of the human-authored romance novels that the AI has trained on are likely to be sold.” …[T]his theory of market dilution expands into “uncharted territory.”…
That the Trump Administration fired both the National Archivist and the head of the subordinate Copyright Office, while also signaling strong support for the unfettered private-sector development of artificial general intelligence (in order to win the AI race against the Chinese, while of course making billions of bucks), might mean less than full-throated support for the market-dilution theory. The release of the copyright office’s interim report on LLMs, virtually concurrent with the firings, suggests as much. Still, it’s nice to know that the Kadrey decision isn’t exactly an outlier.
To the contrary, market dilution is a well-established theory in the related area of trademark law. The centerpiece is the Federal Trademark Dilution Act. The FTDA was explained by the Supreme Court in Moseley v. V. Secret Catalogue, Inc. The Court’s syllabus tells the story:
An army colonel sent a copy of an advertisement for petitioners' retail store, “Victor's Secret,” to respondents, affiliated corporations that own the VICTORIA'S SECRET trademarks, because he saw it as an attempt to use a reputable trademark to promote unwholesome, tawdry merchandise. Respondents asked petitioners to discontinue using the name, but petitioners responded by changing the store's name to “Victor's Little Secret.” Respondents then filed suit, alleging, inter alia, “the dilution of famous marks” under the Federal Trademark Dilution Act (FTDA). This 1995 amendment to the Trademark Act of 1946 describes the factors that determine whether a mark is “distinctive and famous,” 15 U.S.C. § 1125(c)(1), and defines “dilution” as “the lessening of the capacity of a famous mark to identify and distinguish goods or services,” § 1127. To support their claims that petitioners' conduct was likely to “blur and erode” their trademark's distinctiveness and “tarnish” its reputation, respondents presented an affidavit from a marketing expert who explained the value of respondents' mark but expressed no opinion concerning the impact of petitioners' use of “Victor's Little Secret” on that value. …
Respondents' mark is unquestionably valuable, and petitioners have not challenged the conclusion that it is “famous.” Nor do they contend that protection is confined to identical uses of famous marks or that the statute should be construed more narrowly in a case such as this. They do contend, however, that the statute requires proof of actual harm, rather than mere “likelihood” of harm. The contrast between the state statutes and the federal statute sheds light on this precise question. The former repeatedly refer to a “likelihood” of harm, rather than a completed harm, but the FTDA provides relief if another's commercial use of a mark or trade name “causes dilution of the [mark's] distinctive quality,” § 1125(c)(1) (emphasis added). Thus, it unambiguously requires an actual dilution showing. This conclusion is confirmed by the FTDA's “dilution” definition itself, § 1127. That does not mean that the consequences of dilution, such as an actual loss of sales or profits, must also be proved. This Court disagrees with the Fourth Circuit's Ringling Bros. decision to the extent it suggests otherwise, but agrees with that court's conclusion that, at least where the marks at issue are not identical, the mere fact that consumers mentally associate the junior user's mark with a famous mark is not sufficient to establish actionable dilution. Such association will not necessarily reduce the famous mark's capacity to identify its owner's goods, the FTDA's dilution requirement.
The emphasis on the last sentence, above, is by me. Not only is consumer confusion not necessarily a foregone conclusion. Due to what is called the “spillover effect,” trademark, and likewise copyright, infringement is potentially a two-edge sword. In the words of one source, “Importantly, the Spillover Effect is agnostic on the evaluation’s outcome. Spillover Effects can move in positive or negative directions depending on how the new use is perceived.”
For example, in the Victoria Secrets case, it’s at least theoretically possible that the presence of a Victor’s Little Secret store in the same shopping mall as a Victoria’s Secret shop could actually inspire shoppers --- who find Victor’s offerings to be tawdry or otherwise substandard --- to move on to the real thing. Another example: A sampling of a song in another recording could arouse or rekindle interest in the earlier, sampled tune. Note that this effect is at least theoretically possible, even if the arguably-infringing work is of manifestly inferior quality.
Applying this to our situation, a deluge of inferior, AI-authored or assisted romance novels might encourage enthusiasts to seek out the better (human authored) alternatives. Note: I am not for a moment suggesting this is the case. I only am suggesting that this possibility poses (or perhaps, more accurately, complicates) our problem of proof.
Proving Our “Market Dilution” Case
It won’t come as news to you that litigation is expensive. Copyright-infringement plaintiffs and their counsels can take comfort from the provision for prevailing-party attorneys’ fees in the copyright act. Still, undertaking an infringement action against a major player, e.g., Meta, is no lighthearted undertaking. The availability of the class action also affords some comfort. Nonetheless the law firm, or consortium of firms, embarking into what the copyright office calls “uncharted territory” demands a well-funded expeditionary force. Below, let me lay out in detail what I have identified as key aspects of the undertaking. Let me note, too, that fair use is an affirmative defense to be initially asserted by the accused infringer, aka the defendant. This means the defendant bears the initial burden. Nonetheless, the plaintiffs who propose to burst the fair-use bubble on the basis of market infringement face a daunting challenge. Allow me to elaborate.
Romance novels strike me as an ideal example, since they generally fall below the level of so-called “highbrow” or “serious” literature. While an author creating at the level of, let us say, Jonathan Franzen may be pestered by AI-generated “summaries, workbooks, and guides,” it’s doubtful that GenAI will pen a phony Franzen novel anytime soon. And if it does, the truth will out almost immediately, I should think. On the other hand, and by way of contrast, writers of romances seem manifestly vulnerable to AI competitors.
Key Metrics to Track
Sales Data: The most direct and immediate metric en route to establishing a cause-effect relationship would seem to be the change in sales numbers of human-written romance novels before and after the LLM-generated novels entered the market.
Step one would seem to be establishment of the universe of romance novels in terms of the number of such works on the market. This will require a definition of “romance novel.” According to one source, a romance novel revolves around a central love story and has a happy ending.
Step two is to differentiate between those novels in this universe that are human-generated and which are LLM-generated.
These two steps will require the early involvement of experts. This may pose a Catch 22-type of dilemma. On one hand, this may require the involvement of a law firm with sufficient resources to fund your lawsuit. On the other, such firms may be reluctant to take on your case in the absence of exactly the sort of marketing-research data I am identifying here. More on this dilemma in a moment. Let’s for now assume that somebody has ponied up to pay the fiddler.
Free sources are available on the Internet, as you might expect. For example, Wordsrated offers these “Romance Sales Statistics” as of October 2022:
Romance novels generate over $1.44 billion in revenue, making romance the highest-earning genre of fiction.
Romance reached over 39 million printed units sold over the last 12 months as of May 2023.
Romance sales grew by 52% compared to the 12 months ending May 2022, and this has been the third consecutive year with positive growth in romance novel sales in printed format.
Sales of romance novels more than doubled compared to 2021 figures (12 months ending May 2021).
Over 33% of books sold in mass-market paperback format were romance novels.
as well as a seven-year retrospective on sales and growth. Sweet Savage Flame offers “40+ Romance Novel Sales Statistics for 2023.” Assuming the reliability of such (something I have not investigated for this article), cobbling together universal sales data on the cheap appears to be doable. Moving much beyond this threshold is likely to require experts in market research.
Market Share: Compare the market share of human-authored romance novels versus LLM-generated novels.
I think this is where it gets really dicey. Given that you are going to need an expert witness anyway, if your case goes to trial, you and your counsel should seriously consider biting the bullet right up front. Without endorsement (again: I have not vetted any of the sites and services I am featuring in this article), the Business Research Company came up in my Google search for “What is the market share of human-produced books in the romance-novel market in 2025”. Its site’s home page offers some free general information about the fiction-books market, as well a chat bot. By clicking on “Request Proposal” in the upper-right quadrant, you can access a form to do just that. Grand View Research offers similar research services:
Who we are
Grand View Research is an India & U.S. based market research and consulting company, registered in the State of California and headquartered in San Francisco. The company provides syndicated research reports, customized research reports, and consulting services. Grand View Research database is used by the world's renowned academic institutions and Fortune 500 companies to understand the global and regional business environment. Our database features thousands of statistics and in-depth analysis on 46 industries in 25 major countries worldwide.
What we do
We help clients make informed business decisions. We offer market intelligence studies ensuring relevant and fact-based research across a range of industries including chemicals, materials, energy, healthcare, and technology. With a deep-seated understanding of many business environments, Grand View Research provides strategic objective insights.
Circana self-describes as, “the gold standard in point-of-sale tracking for the publishing market, covering approximately 85% of trade print books sold in the U.S., through direct reporting from all major retailers, independent bookstores, and many others.” Again, without in any sense endorsing this firm, it does sound like the sort of organization we will need.
No doubt there are many more choices.
By selecting an experienced IP law firm, you may very well find that your attorneys already have a stable of experts upon which they draw for this kind of information. Case reports published by Westlaw, Lexis Nexus and other online legal research services frequently include the names of the attorneys and firms that represented the litigants. For example, in searching the Kadrey case on Westlaw, I found:
Joseph R. Saveri, Aaron Cera, Cadio R. Zirpoli, Christopher K.L. Young, Melissa Tribble, Holden J. Benon, Louis Andrew Kessler, Joseph Saveri Law Firm, LLP, San Francisco, CA, Joshua I. Schiller, Margaux Poueymirou, Maxwell V. Pritt, Joshua Michelangelo Stein, Boies Schiller Flexner LLP, San Francisco, CA, David Boies, Pro Hac Vice, Boies Schiller and Flexner, Armonk, NY, David L. Simons, Pro Hac Vice, Boies Schiller Flexner LLP, New York, NY, David A. Straite, DiCello Levitt LLP, New York, NY, Jay Schuffenhauer, Pro Hac Vice, Jesse Michael Panuccio, Pro Hac Vice, Boies Schiller Flexner LLP, Washington, DC, Kathleen Jordan McMahon, Sidran Law Corp., San Ramon, CA, Madeline E. Hills, Pro Hac Vice, DiCello Levitt LLP, Chicago, IL, Mohammed Rathur, Cafferty Clobes Meriwether & Sprengel, LLP, Chicago, IL, Matthew Butterick, Matthew Butterick, Attorney at Law, Los Angeles, CA, for Plaintiffs Richard Kadrey, Sarah Silverman.
and much, much more of the same. Finding the right law firm can go a long way in finding the right expert(s) and ultimately the right data to develop a correlation between lost market share and the onslaught of AI-generated competition. However, interesting a law firm in your proposed case may take some salesmanship. As noted above, a Catch 22 situation may be in play.
Such a suit is a joint business venture in reality. And the firm, while it hopes to make a “profit” down the road, knows it may have to fund the “enterprise” at the front end, as well as during an uncertain number of months, and often years, of litigation --- discovery, motion practice, trial, appeals--- before the big “payday,” if indeed your side wins. Consequently, bringing as much evidence to the conversation from the get-go will help sell a prospective group of attorneys on your case. Bringing a ready-made class of plaintiffs also makes your proposed law suit more attractive.
Author Earnings: Measuring the earnings of authors in the genre over time to see if they have stagnated, declined, or been redistributed would seem to be essential, given that the Spillover Effect (as we have seen) can work both ways. The best primary source may be the authors themselves… starting with you. Are you reading this article because you fear a decline in your income… or have you already experienced such a decline? Where do you find similarly-situated authors who might be interested your contemplated class action? Well, how about your own professional organization, if there is one? Sticking to the romance novel, we find Romance Writers of America, which styles itself “The Voice of Romance Writers.” While my exploration of this nonprofit organization’s website revealed little relating to litigation, it did claim:
One of the biggest benefits of RWA membership is the unprecedented number of opportunities to interact with fellow writers who are in all stages of their careers. We believe an important part of a career writer’s job is peer networking, and we are committed to helping members foster those connections through our chapters, forums, and PAN and PRO communities of practice. Members can also serve the organization and help shape its future by volunteering on both a local and national level. What are you waiting for? Join us today!
Peer-networking vehicles like Romance Writers of America seem one of the most efficient ways of finding authors who share your concern about lost income due to competition from AIs. Copyright law provides for liquidated damages, called “statutory damages” in the statute. But the law also allows for actual damages, and that means lost income.
Furthermore, showing up at your desired law firm’s door with a phalanx of aggrieved authors makes for a far-more-powerful pitch. Combining universal sales data, which you can gather on your own, with earnings data gleaned from among the potential plaintiffs may look highly persuasive to your target law firm. If an organization, such as the non-profit Romance Writers of America, is prepared to defray some of the anticipated costs of prosecuting the case, all the better. The ACLU frequently takes the lead in constitutional cases that otherwise would never see the light of litigation. The free speech implications of a case challenging the fair use defense by an LLM might attract similar advocacy-organization support. And as the partial list of legal counsel provided above from the Kadrey opinion illustrates, law firms are often adept at spreading the risk by involving other firms in the case.
Assess Consumer Behavior
Okay, so you and your fellow romance-novel authors and your legal eagles have, through your own efforts and those of your market research firm and/or other expert(s), adduced a body of evidence that indicates an inverse correlation between human romance-writers’ market share of sales and consequential income, on one hand, and the intrusion of AI-generated works on the other. In other words, as the number of LLM-generated works rose, market share and revenue of human-generated works declined.
Correlation, it has been said, does not imply causation. (If the reasons aren’t obvious, read this article.) I might respond to this contention, that a strong correlation does in fact at least hint at causation. I also would remind you, that fair use is an affirmative defense, putting a burden on the accused defendant which wishes to hide behind it. Nonetheless, I must agree that we have not found our way to cause-and-effect yet.
In order to get there, we need to bolster our quantitative data, above, with some essential, additional qualitative/quantitative data.
Surveys and Focus Groups: Conducting a consumer survey and then bringing that survey data to life via focus groups, strikes me as the way to raise our correlation to the level of cause-and-effect we will need to persuade a judge and/or a jury that AI is the culprit. We will want to ask readers, if in the survey they reveal they are choosing LLM-generated novels over human-written ones, why. Are LLM-generated books perceived as direct substitutes for our works? If so the “why” question, it seems to me, becomes crucial. Do readers of AI books perceive them to be of the same or similar quality as our own human works? Or is it the price point that drives readers to AI novels?
The latter reason certainly helps us establish that lower-priced AI books are a cause of lost sales, market share and revenue. The former, I think, does a bit more than that. It also implies that the LLM was trained on our works, thus accounting for the comparable quality of the AI’s work product.
Establishing how the LLM was trained
To sum up: Thus far we hopefully have established that
The market for human-generated romance novels has suffered a decline in sales, or
Put another way, the human share of the romance-novel market has declined;
The decline correlates inversely with the increase in the presence of LLM-generated romance novels;
This inverse correlation is more than coincidental. Survey plus focus-group research perhaps has elevated this inverse correlation to the level of cause-and-effect sufficient to satisfy the preponderance of evidence standard of a civil suit.
It remains, I think, for us to show (1) which LLM(s) generated the intrusive works at issue in our case, and (2) what fodder was/were the LLM(s) fed.
How to prove an LLM was the source
Thus far in developing our case, we have ascertained evidence that the romance-novel market has been diluted and that the cause of the dilution is competition from AI (LLM-generated) romance novels. This is our refutation of the fair-use defense. However, at this point in our preparation for litigation we have been going up against a sort of virtual strawman. We do not yet have ourselves any actual defendant(s).
The ostensible authors of the AI novels are not appropriate defendants, unless they can be shown to have actually copied substantial portions of our own romance novels. Copyright does not protect ideas. Copyright law protects their expression. Thus, for example, in one relatively well-known case, the thriller-writer Lewis Purdue lost his case against Random House and Dan Brown involving allegations that Brown’s Da Vinci Code was a knockoff of an earlier Perdue novel. The court in that case found that, despite plot and character similarities, Brown’s expression of the story differed decidedly from Perdue’s Daughter of God.
No. We still must successfully prove a case of copyright infringement against one or more LLMs. We must prove authorship that was based on training that used our copyrighted works.
The bad news is that currently there is no universally-reliable method of definitively proving authorship by a specific LLM. But all is not lost. One way to obtain this information is via pre-complaint discovery. For example, under Pennsylvania’s procedural rules,
(a) A plaintiff may obtain pre-complaint discovery where the information sought is material and necessary to the filing of the complaint and the discovery will not cause unreasonable annoyance, embarrassment, oppression, burden or expense to any person or party.
(b) Upon a motion for protective order or other objection to a plaintiff's pre-complaint discovery, the court may require the plaintiff to state with particularity how the discovery will materially advance the preparation of the complaint. In deciding the motion or other objection, the court shall weigh the importance of the discovery request against the burdens imposed on any person or party from whom the discovery is sought.
Once the LLMs have been identified, the same pre-complaint-discovery rule can be used to learn from these LLMs the materials on which they were trained.
On the technology front, watermarking seems to be the most promising avenue for identifying particular LLMs. Required by the EU’s AI Act, watermarking has been found to have limitations, according to an MIT researcher last year.
Watermarking works by inserting hidden patterns in AI-generated text, which allow computers to detect that the text comes from an AI system. They’re a fairly new invention, but they have already become a popular solution for fighting AI-generated misinformation and plagiarism. For example, the European Union’s AI Act, which enters into force in May [2024], will require developers to watermark AI-generated content. But the new research shows that the cutting edge of watermarking technology doesn’t live up to regulators’ requirements, says Robin Staab, a PhD student at ETH Zürich, who was part of the team that developed the attacks.
Still, particularly in combination with pre-complaint discovery directed toward the ostensible “authors” of suspected AI novels, watermarks could be a valuable evidentiary source. In combination, these avenues should lead to credible identifications of the AI culprits. Once identified, the owners of these LLMs can be forced via pre- or post-complaint discovery to reveal the materials and sources on which their LLMs were trained.
Conclusion
I’m sure it’s now abundantly clear that the road opened by the Kadrey case is a steep uphill climb for creators who wish to use the market-dilution theory of copyright infringement. That is why such cases are most likely realistic only for very wealthy, or large groups of, plaintiffs, and major law firms, both possibly bolstered by interested advocacy organizations.
The very existence of the Kadrey case and others like it demonstrates that such plaintiffs and lawyers exist. Learning from Kadrey, some creators --- I hope and believe --- will want to attempt to operationalize the market-dilution theory to overcome the LLMs’ fair-use defense. My further hope and belief is that this article will prove to be helpful to them, as they start their uphill climb.