“`html

Adobe Sued for Allegedly Training AI ⁢Model on ⁣Pirated‌ Books

Table of Contents

Adobe Sued for Allegedly Training AI ⁢Model on ⁣Pirated‌ Books

A ‍class-action lawsuit claims Adobe used copyrighted books, including works by⁣ author Elizabeth Lyon,‌ to train its SlimLM AI model, raising questions ‌about data sourcing in AI advancement.

The Allegation: Copyright Infringement in AI training

Adobe, a leading software company, is facing⁢ a proposed class-action lawsuit centered around the training data used for its SlimLM AI model. ‌The lawsuit, filed on behalf of author Elizabeth Lyon, alleges‍ that Adobe utilized ‍pirated copies of numerous books ‌- including Lyon’s own works -‌ to train the model. This raises meaningful concerns about copyright infringement and the ethical sourcing of data in the rapidly ‍evolving field of artificial ‍intelligence.

Lyon,a non-fiction writing ⁢guidebook author,claims her works were included within ‌the pretraining ⁤dataset used by Adobe for SlimLM. The lawsuit asserts that Adobe knowingly or ‍negligently incorporated copyrighted material obtained through illicit means into its AI training process.

Understanding SlimLM and its Training Data

SlimLM is described by Adobe as a⁢ series of ‌small language‍ models designed for “document assistance tasks on⁣ mobile devices.” The ⁣company states that SlimLM was⁤ pre-trained on SlimPajama-627B, an open-source⁣ dataset released by‍ Cerebras in june 2023.SlimPajama-627B is presented ⁢as a⁢ “deduplicated, multi-corpora” ‌dataset, meaning it was compiled ‌from various ⁣sources and efforts‍ were made to remove redundant information.

However, the lawsuit challenges the claim of proper deduplication and‌ lawful sourcing. The core argument is ‍that despite being labeled “open-source,” the ‍dataset contained copyrighted material ⁢obtained through ⁢unauthorized channels. The lawsuit doesn’t directly accuse⁤ Cerebras of⁤ wrongdoing, but focuses on Adobe’s use of the dataset knowing,⁢ or having reason‍ to know, it contained infringing material.

The size ⁤of SlimPajama-627B – 627 billion tokens – is substantial,⁤ making a manual review‍ for copyright violations impractical. This ‌highlights the difficulty in⁢ ensuring the⁢ legality of large datasets used for AI training.

The Legal Landscape: AI, Copyright, ⁣and fair Use

This lawsuit is part⁣ of a ‍growing trend of legal challenges‌ concerning the use⁢ of copyrighted material in ‌AI training. Several key questions are ⁣at the forefront of these‌ debates:

Is AI training considered “fair use” under copyright‍ law? This ‍is a central point of contention. ‍ Arguments for fair use often center on the transformative nature of AI training – that the AI is not simply reproducing ⁣the⁤ original work,⁣ but using it to learn patterns and‍ generate new content.
Does the source of the training data matter? Even if AI training is deemed fair use, using illegally obtained⁤ copyrighted material ⁣could ‍still be a violation⁢ of‍ copyright law.
What is the duty of AI developers ‍to ensure the legality of their training data? ⁢ The‍ lawsuit suggests Adobe‍ had a duty to verify the source of the data used to train SlimLM.

Adobe Hit: AI Training Lawsuit Over Copyrighted Authors’ Work

Adobe Sued for Allegedly Training AI ⁢Model on ⁣Pirated‌ Books

The Allegation: Copyright Infringement in AI training

Understanding SlimLM and its Training Data

The Legal Landscape: AI, Copyright, ⁣and fair Use

Related

Adobe Hit: AI Training Lawsuit Over Copyrighted Authors’ Work

The Allegation: Copyright Infringement in AI training

Understanding SlimLM and its Training Data

The Legal Landscape: AI, Copyright, ⁣and fair Use

Share this:

Related