AI and Corporate Knowledge Capture
AI and the Corporate Capture of Knowledge
More than a decade after Aaron Swartz’s death, the United States continues to grapple with the contradiction that led to his tragic end.
Swartz championed the free accessibility of knowledge,especially research funded by the public. He downloaded thousands of academic articles from JSTOR, intending to make them widely available. The federal government responded with felony charges and the threat of decades in prison. After enduring two years of aggressive prosecution,Swartz died by suicide on January 11,2013.
The unresolved questions from his case are now central to debates surrounding artificial intelligence, copyright, and control over information.
During Swartz’s prosecution, significant taxpayer-funded research conducted at public institutions remained locked behind expensive paywalls. Individuals were unable to access work they had helped fund without paying private journals and research websites. Swartz believed this wasn’t accidental, but a outcome of purposeful legal, economic, and political choices. His actions directly challenged those choices, and the government treated him accordingly.
Today’s AI progress represents a much larger, profit-driven appropriation of information. Tech companies are ingesting massive amounts of copyrighted material – books,journalism,academic papers,art,music,and personal writing – at an industrial scale.This data scraping often occurs without consent, compensation, or transparency, and is used to train large AI models.
These AI companies then sell their proprietary systems, built on both public and private knowledge, back to the very peopel who funded that knowledge. However, the government’s response differs sharply. There are no criminal prosecutions, no lengthy prison sentences. Lawsuits move slowly, enforcement is uncertain, and policymakers express caution due to AI’s perceived economic and strategic importance. Copyright infringement is often framed as an unavoidable step toward ”innovation.”
Recent events highlight this disparity. in 2025, Anthropic settled with publishers over claims that its AI systems were trained on copyrighted books without permission. The agreement reportedly valued the infringement at approximately $3,000 per book across an estimated 500,000 works, totaling over $1.5 billion. Plagiarism disputes between artists and alleged infringers
