Search engines that don’t pay up can’t index Reddit content


When Reddit said last month that it would prevent unauthorized data from being deleted from its site, everyone’s (rightly) first reaction was “AI, AI, AI.” However, now that the change has come into effect, chatbot makers aren’t the only ones locked out. The widely used forum blocks all search engines except Google, which reportedly struck a deal with Reddit earlier this year. worth 60 million dollars annually.

404 Media reported on Wednesday (and Engadget confirmed in our surveys) searching for Reddit results from the past week on rival engine Bing (using “site:reddit.com”) returns empty results. The publication reported that DuckDuckGo produced seven links with no description, providing only the note “We’d like to show you a description here, but the site won’t let us.” The engine seems to have removed even these now, as our test only produced a blank page reading “no results found”.

Reddit time said last month While it will update its Robots Exclusion Protocol (robots.txt) to prevent automated data deletion, it’s now clear that it’s not just intended to block AI companies like Perplexity and its controversial “answer engine.” Currently, Google appears to be the only search engine allowed to crawl Reddit and return results from the “front page of the web”.

Unfortunately, part of the forum website’s robots.txt file states: “Reddit believes in an open internet, but not in the abuse of public content.” The file for Reddit now essentially says, “Don’t break.” Apparently, it now considers search engines that don’t accept exclusive deals to be abusing its content.

The ubiquitous robots.txt is a web standard that specifies which parts of a site are crawlable. Although many browsers ignore its instructions, Google’s standard procedure is to respect it. So, on the technical side, the companies involved in the lucrative deal have implemented some manual overrides.

Of course, the saga is a trickle down effect AI chatbots breaking the live web for results. Courts are slow to determine how much of the open internet is fair use for training chatbots, companies like Reddit now depend on protecting their data from defaulters, building walls at the expense of the open internet. (Though given Microsoft’s integral role in this AI era, It gets easier with OpenAI (From the outset, it seems ironic that at least one aspect of Bing would find itself on the losing end.)

Colin Hayhurst, CEO of Mojeek, a lesser-known “no-track” search engine, said. 404 Media Reddit is “killing everything for search except Google”. In addition, the executive said that his attempts to contact Reddit were ignored. “This has never happened to us before,” he said. “Because it’s happened to us, we usually get blocked because of ignorance or stupidity or whatever, and when we contact the site, of course you can sort it out, but we’ve never had a response from anyone before.”

Engadget has reached out to Google and Reddit for comment and confirmation, but we have not heard back by publication. 404 Media reported that he ran into a similar wall of silence from the companies.

Reddit has made no secret of its desire to discourage AI companies from wiping out their trove of data in this age of advancing AI. Last year, CEO Steve Huffman risked alienating a large portion of his user base blocking third-party API requestsleading death favorite apps like Christian Selig’s Apollo. Although widespread objections among moderators and forum participantsthe company only temporarily lost a small number of users.

The gamble paid off and Reddit recovered. He was presented to the public in March.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *