AI-Generated Websites Surge to 35% Post-ChatGPT: Stanford & Imperial Study

News Context

At a glance

A landmark study conducted by researchers from Stanford University, Imperial College London, and the Internet Archive has revealed that 35% of newly published websites as of mid-2025 are...
The study analyzed 33 months of website snapshots from the Internet Archive’s Wayback Machine, covering the period from August 2022 to May 2025.
Jonáš Doležal, a researcher at Imperial College London and co-author of the study, expressed astonishment at the pace of change.

A landmark study conducted by researchers from Stanford University, Imperial College London, and the Internet Archive has revealed that 35% of newly published websites as of mid-2025 are generated or assisted by artificial intelligence. This figure marks a dramatic shift from late 2022, when the proportion was effectively zero, according to the findings published in the paper titled The Impact of AI-Generated Text on the Internet.

The Speed of AI Adoption

The study analyzed 33 months of website snapshots from the Internet Archive’s Wayback Machine, covering the period from August 2022 to May 2025. Using Pangram v3, an AI text detection tool selected for its accuracy, researchers tracked the proliferation of AI-generated content across the web. The results underscore what the authors describe as “the fastest technological transformation of digital space in internet history.”

View this post on Instagram about Imperial College London, Internet Archive

From Instagram — related to Imperial College London, Internet Archive

Jonáš Doležal, a researcher at Imperial College London and co-author of the study, expressed astonishment at the pace of change. I find the sheer speed of the AI takeover of the web quite staggering. After decades of humans shaping it, a significant portion of the internet has become defined by AI in just three years. The findings suggest that AI-generated content has not only become widespread but has also begun to redefine the digital landscape in a fraction of the time it took to build the internet in its original form.

Measurable Effects on Web Content

The study tested six hypotheses about the impact of AI-generated text on the web, but only two were empirically confirmed. The first was a reduction in semantic diversity. AI-generated websites exhibited pairwise semantic similarity scores 33% higher than human-written content, indicating that the same ideas are being expressed in increasingly similar ways. This phenomenon, described by researchers as “semantic contraction,” suggests that AI-generated content may be contributing to a more homogenized web.

The second confirmed effect was a shift toward “artificial positivity.” AI-generated content tends to avoid negative or controversial language, resulting in a web that feels more sanitized and less diverse in tone. However, the study found no evidence to support widespread concerns about increased misinformation or stylistic homogeneity, despite public speculation about these risks.

Model Collapse and Future Risks

At 35% prevalence, AI-generated content has reached a threshold where “model collapse” is no longer just a theoretical concern. Model collapse refers to the degradation of AI performance when models are trained on data that includes a significant proportion of AI-generated content. As Maty Bohacek, a student researcher at Stanford and co-author of the study, noted, As AI-generated content spreads, the challenge is finding a role for these models that doesn’t just result in a sanitized, repetitive web. The researchers warn that if current trends continue, future generations of AI models may struggle to produce original or diverse outputs, further exacerbating the risks of semantic contraction.

Study Finds A Third of New Websites are AI-Generated

The study also found that AI-generated articles surpassed human-written publications entirely by November 2024, a milestone that underscores the rapid adoption of AI tools in content creation. Despite this shift, the researchers observed that AI-generated content maintains citation rates similar to human-written articles, suggesting that the quality of references has not yet been significantly compromised.

Implications for the Digital Ecosystem

The findings raise critical questions about the long-term sustainability of the web as a diverse and dynamic space. While AI-generated content has democratized access to information creation, it has also introduced challenges related to originality, creativity, and the potential for echo chambers. The study’s authors emphasize the need for strategies to mitigate these risks, including the development of AI models that can introduce “friction” or distinct personality to avoid becoming mere replacements for human voices.

For developers, regulators, and platform operators, the study serves as a call to action. As AI continues to reshape the digital landscape, stakeholders must consider how to balance innovation with the preservation of semantic and stylistic diversity. The study’s data provides a foundation for future research into the long-term effects of AI on the internet, including its impact on search algorithms, content moderation, and user engagement.

The research also highlights the role of detection tools like Pangram v3 in monitoring AI-generated content. As the web becomes increasingly AI-driven, such tools will be essential for distinguishing between human and machine-generated text, ensuring transparency and accountability in digital spaces.

Looking Ahead

The study’s authors caution that the 35% figure is not a final endpoint but a snapshot of a rapidly evolving trend. As AI tools become more sophisticated and accessible, the proportion of AI-generated content is likely to grow, further accelerating the transformation of the web. The challenge for the tech industry will be to harness the benefits of AI while mitigating its potential drawbacks, ensuring that the internet remains a space for diverse voices and ideas.

For now, the study provides a clear benchmark: the internet of 2025 is already a hybrid space, shaped as much by algorithms as by human creativity. The question is no longer whether AI will dominate the web, but how society will adapt to this new reality.