Reddit Sues Perplexity & AI Companies for Scraping Comments
- Reddit has escalated its fight against unauthorized use of its data for AI training, filing a lawsuit wednesday against Perplexity AI and three other companies.Teh suit alleges large-scale...
- What: Reddit is suing Perplexity AI, Oxylabs UAB, AWMProxy, and SerpApi for scraping Reddit content for AI training.
- The lawsuit claims the defendants employed "shady circumvention tactics" to bypass Reddit's protocols and extract data for commercial purposes.
Reddit Sues Perplexity and Others Over AI Data Scraping, Focusing on User Comments
Reddit has escalated its fight against unauthorized use of its data for AI training, filing a lawsuit wednesday against Perplexity AI and three other companies.Teh suit alleges large-scale scraping of Reddit content – specifically user comments – to feed AI models without permission, violating copyright, engaging in unfair competition, and unjustly enriching the defendants.This follows a similar lawsuit filed in June against Anthropic.
The lawsuit claims the defendants employed “shady circumvention tactics” to bypass Reddit’s protocols and extract data for commercial purposes. Reddit emphasizes its content,particularly its vibrant comment sections,as a unique and valuable resource. The platform argues that unauthorized scraping undermines its ability to control how its data is used and to protect its users.
“Reddit has rules,” the lawsuit states. “It does not permit unauthorized commercialization of Reddit content… If AI companies want to legally access Reddit data, they need to comply with Reddit’s policies.” Reddit points to agreements it has reached with companies like OpenAI and Google as examples of responsible data access, where safeguards are in place.
The defendants named in the suit are:
* Perplexity AI: A San Francisco-based AI chatbot focused on web search.
* Oxylabs UAB: A Lithuania-based web scraping service.
* AWMProxy: A Russian web domain company.
* serpapi: A Texas-based search engine results page (SERP) API provider.
The lawsuit builds on Reddit’s previous legal action against Anthropic, demonstrating a firm stance against unauthorized data usage. Perplexity AI responded to the Associated Press stating they will “fight vigorously for users’ rights to freely and fairly access public knowledge.”
This lawsuit is a critical moment in the ongoing debate about AI training data. Reddit isn’t simply protecting its intellectual property; it’s asserting control over how its community’s contributions are used. The value of reddit lies not just in the information shared, but in the dynamic, conversational nature of its forums. Scraping this data without permission risks devaluing that unique environment. We’re likely to see more platforms adopt similar legal strategies as AI development continues to accelerate, and the question of “fair use” in the context of large language models becomes increasingly complex. The fact that Reddit is distinguishing between companies that negotiate access and those that scrape is importent - it’s not anti-AI, but pro-responsible AI development.
– marcusrodriguez
The rise of AI has created a significant demand for training data, and web scraping has become a common method for acquiring it. Though, this practice raises legal and ethical concerns, particularly regarding copyright, terms of service, and user privacy.Here’s a breakdown of the types of companies involved in this ecosystem:
| Company Type | Role in Data Scraping | Example (from lawsuit) |
|---|---|---|
| AI Chatbot Developer | Utilizes scraped data to train large language models. | Perplexity AI |
| Web Scraping Service | Provides tools and infrastructure for extracting data from websites. | Oxylabs UAB |
| Proxy Provider | Offers IP addresses to mask scraping activity and bypass restrictions. | AWMProxy |
| SERP API Provider | Provides access to search engine results, often used for data collection. | SerpApi |
This case, along with the suit against Anthropic, signals Reddit’s determination to protect its data and establish clear rules for AI companies seeking to leverage its platform. The outcome will likely have far-reaching implications for the future of AI development and data governance.
