Websites accuse AI startup Anthropic of bypassing their anti-scraping rules and protocol


Freelancer accused Anthropic, the AI ​​startup behind Claude’s big-tongued models, of ignoring the robots.txt protocol to scrape their website’s data. Meanwhile, iFixit CEO Kyle Wiens said Anthropic ignored the website’s policy prohibiting the use of the website’s content for AI model training. Matt Barrie, CEO of Freelancer, said this Information Anthropic’s ClaudeBot is “the most aggressive scraper to date.” Its website allegedly received 3.5 million hits from the company’s browser in four hours, nearly five times the volume of the “number two” AI crawler. Likewise, Wiens Posted on X/Twitter that Anthropic’s bot hit iFixit’s servers a million times in 24 hours. “You’re not only taking our content without paying, you’re tying up our devops resources,” he said.

Again in June, Wired charge another AI company, Perplexity, crawls its website despite the existence of Robot Exclusion Protocol, or robots.txt. A robots.txt file usually contains instructions for web crawlers about which pages they can and cannot visit. Although compliance is voluntary, it is largely ignored by malicious bots. after Wired part emerged, TollBit, a startup that connects AI firms with content publishers, reported that Confusion isn’t the only one bypassing robots.txt signals. Although they are not named, Business Insider He said he learned that OpenAI and Anthropic were also ignoring the protocol.

Barrie said Freelancer initially tried to deny the bot’s access requests, but eventually had to block Anthropic’s browser entirely. “It’s a terrible scrap [which] it slows down everyone working on the site and ultimately affects our revenue,” he added. As for iFixit, Wiens said the website set alarms for high traffic and that his people were being woken up at 3 a.m. by Anthropic’s activities. .The company’s scanner stopped scraping after adding a line to iFixit robots.txt file specifically disallows Anthropic’s bot.

This was reported by the AI ​​startup Information that it respects robots.txt and that its browser “respects that signal when iFixit implements it.” He also said that “considering how soon it is, he aims for minimal disruption [it crawls] are the same domains,” so he is now investigating the case.

AI firms use crawlers to gather content from websites that they can use to train their generative AI technologies. They were the target of many lawsuits as a result, publishers accuse them of copyright infringement. Companies like OpenAI are making deals with publishers and websites to prevent more lawsuits from being filed. OpenAI’s content partners so far include News Corp, Vox Mediathe Financial Times and Reddit. iFixit’s Wiens appears to be open to the idea of ​​signing a deal on how to repair articles on its website, and has also tweeted to Anthropic that it’s open to talking about licensing content for commercial use.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *