Gate Square “Creator Certification Incentive Program” — Recruiting Outstanding Creators!
Join now, share quality content, and compete for over $10,000 in monthly rewards.
How to Apply:
1️⃣ Open the App → Tap [Square] at the bottom → Click your [avatar] in the top right.
2️⃣ Tap [Get Certified], submit your application, and wait for approval.
Apply Now: https://www.gate.com/questionnaire/7159
Token rewards, exclusive Gate merch, and traffic exposure await you!
Details: https://www.gate.com/announcements/article/47889
Adobe Faces Legal Challenge Over Unauthorized Use of Authors' Works in AI Model Development
Adobe’s aggressive expansion into artificial intelligence is facing a significant legal setback. The company stands accused of incorporating pirated literary materials into its machine learning infrastructure—a move that has sparked a class-action lawsuit centered on copyright violations.
The Core Allegation
Author Elizabeth Lyon from Oregon has filed a proposed class-action suit claiming that Adobe utilized unauthorized copies of books, including her own authored works, as training material for SlimLM, the company’s specialized language model designed for mobile document processing applications. According to court documents, these literary works were incorporated without author consent or compensation.
How the Pirated Books Made Their Way Into Adobe’s System
The pathway to this alleged misuse traces back to SlimPajama-627B, a public dataset created by Cerebras and released in mid-2023. Adobe relied on this dataset to pre-train SlimLM. However, the lawsuit reveals a problematic chain: SlimPajama itself was derived from RedPajama by incorporating Books3—a vast repository comprising 191,000 published works.
The critical issue: Books3 reportedly contains copyrighted material that was collected without proper authorization. When Adobe built upon this compromised foundation, the company allegedly inherited these copyright violations. As Lyon’s legal team notes, SlimLM became a derivative work containing unauthorized literary content.
A Pattern Emerging Across the Industry
Adobe is hardly the first technology firm to face such accusations. The underlying datasets fueling modern AI systems have become a minefield of copyright disputes:
Why This Matters
The proliferation of AI models requires enormous quantities of text data. When developers source from compilations like Books3 or RedPajama without thoroughly vetting legal provenance, they create institutional risk. The repeated lawsuits suggest that relying on these datasets—however convenient—now carries substantial legal exposure.
For Adobe and similar companies, the message is becoming inescapable: cutting corners on training data sourcing can prove far more expensive than legitimate licensing arrangements.