AI Training and Fair Use: US Federal Court Delivers a Landmark Ruling in Bartz v. Anthropic
17 July 2025
Background
In a closely watched case that could significantly impact the artificial intelligence industry, the US District Court for the Northern District of California recently ruled on copyright infringement claims against Anthropic PBC, the company behind the Claude AI system. The case involved allegations that Anthropic violated copyright law by using millions of books to train its large language models and build a comprehensive digital library.
The plaintiffs, three authors whose works were among those copied, challenged Anthropic’s practice of assembling what it described as “all the books in the world” through purchasing physical books and converting them into digital format, as well as downloading content from known pirate sites. This collection served dual purposes: creating a permanent corporate research library, and providing training data for Anthropic’s AI systems.
The Court’s Analysis
Fair Use Protection for Legitimate AI Training
The court granted summary judgment in favor of Anthropic regarding the use of copyrighted works for training large language models, finding this practice constitutes fair use under Section 107 of the Copyright Act. Judge William Alsup characterized AI training as “quintessentially transformative,” drawing parallels to human learning processes where individuals read extensively to develop their own writing capabilities.
Central to this finding was the court’s determination that the training process does not result in any direct reproduction or substantial similarity to the original works in the AI’s outputs. The court noted that the authors could not demonstrate that Claude users received any infringing copies of the plaintiffs’ works.
The decision emphasized that AI training serves a fundamentally different purpose than the original works – teaching machines to generate new text rather than reproducing existing content.
Infringement Liability for Pirated Content
However, the court firmly rejected Anthropic’s fair use defense regarding materials obtained from pirate sites and found that downloading over seven million copyrighted works from known pirate sources constituted clear copyright infringement, regardless of the intended transformative use.
The court emphasized that fair use cannot excuse the initial act of piracy simply because the copied materials might later serve a transformative purpose. The court based its decision on Anthropic’s creation and maintenance of a permanent research library, finding that Anthropic retained all pirated copies even after determining they would never be used for LLM training. The court concluded that building such a library constituted a separate use requiring independent justification, which Anthropic failed to provide beyond “pocketbook and convenience.”
Alignment with the US Copyright Office Report
The decision generally aligns with the approach outlined in the US Copyright Office’s recent pre-publication regarding Generative AI Training, released in May 2025. Both reject blanket fair use protection for AI training while recognizing that legitimate, transformative uses of lawfully obtained materials may qualify for protection. Both also emphasize that the source and manner of content acquisition matter significantly for fair use consideration. Nevertheless, the report suggests not all AI training purposes will qualify for fair use given the commercial scope of copying and market harm concerns, and instead advocates for licensing arrangements, while the Bartz court found AI training on legitimately sourced materials to be protected under fair use even without requiring licensing agreements.
It should be noted that this is only a pre-publication version of the Copyright Office report, which may be subject to changes before final publication. We will publish a separate client update when the final version is released.
Practical Implications
The Bartz decision establishes important precedents for AI development while reinforcing fundamental copyright principles. For technology companies, the ruling confirms that AI training can qualify for fair use protection when conducted using legitimately obtained materials – regardless of whether those materials were specifically licensed for AI training – and when the resulting outputs do not reproduce substantial portions of the training data.
However, companies cannot circumvent copyright compliance by using pirated materials, even when those materials serve transformative purposes. The court’s willingness to proceed to trial on damages demonstrates that legally questionable sourcing strategies carry significant legal and financial risks.
The bottom line for businesses operating in the AI space is clear: companies that invest in proper content sourcing practices from the outset will find themselves well-positioned as this legal landscape continues to evolve.