If institutions like the New York Times and the major record labels are fighting generative AI through litigation, Reddit is trying a new approach: lock it up…the metadata, that.
According to TechCrunch, Reddit updated its Robots Exclusion Protocol (robot.txt), which is used to tell web bots whether they can crawl and/or scrape a site, this week to included stricter rules. "Reddit will continue rate-limiting and blocking unknown bots and crawlers from accessing its platform." And if bots and crawls don't abide by Reddit's Public Content Policy, they will be blocked or rate-limited too.
Reddit's updated terms are in some part related to an investigation earlier this year by Wired, which found that AI search companies like Perplexity are crawling sites without consent. Responding to the investigation, Perplexity CEO Aravind Srinivas said his company “is not ignoring the Robot Exclusions Protocol and then lying about it…I think there is a basic misunderstanding of the way this works. We don’t just rely on our own web crawlers, we rely on third-party web crawlers as well.”
In addition to the Wired investigation, Reuters recently published a letter by content-licensing startup TollBit alleging that several AI companies are regularly circumventing web standards intended to block their scraping.
As the letter from TollBit claims, "what this means in practical terms is that AI agents from multiple sources (not just one company) are opting to bypass the robots.txt protocol to retrieve content from sites… The more publisher logs we ingest, the more this pattern emerges."
Search Generative Experience
As if the rise of generative AI scraping news sites wasn't threatening enough, Google announced late last year a product called Search Generative Experience (SGE) that " uses AI to create summaries in response to some search queries, triggered by whether Google’s system determines the format would be helpful." The problem, Reuters explains, is that "if publishers want to prevent their content from being used by Google’s AI to help generate those summaries, they must use the same tool that would also prevent them from appearing in Google search results, rendering them virtually invisible on the web."
THE VERDICT:
The threat of AI and generated summaries is very real to not only publishers, but the Internet as we currently know it. If users never have to stray from Google, the ad-supported model that has kept the Internet alive for over two decades now may collapse. Of course, new technologies are nothing if not disruptive, but an era of significant flux may be upon the industry.
Be a smarter legal leader
Join 7,000+ subscribers getting the 4-minute monthly newsletter with fresh takes on the legal news and industry trends that matter.