Amazon Web Services According to Perplexity, it has launched an investigation to determine if the AI is violating its rules Wired. Specifically, the company’s cloud division is investigating claims that the service uses a scanner hosted on its servers that ignores the Robot Exclusion Protocol. This protocol is a web standard where developers put a robots.txt file on a domain that contains instructions on whether or not bots should visit a particular page. Adherence to these guidelines is voluntary, but since web developers started implementing the standard in the 90s, browsers from reputable companies have respected them.
In an earlier passage, Wired informed said its website had detected a virtual machine that was bypassing robots.txt guidelines. That machine is hosted on an Amazon Web Services server using IP address 44.221.181.252, which is “definitely managed by Perplexity.” He reportedly visited other Condé Nast properties hundreds of times over the past three months to scrape content. The watchman, Forbes and The New York Times had also discovered that he had repeatedly visited his publications, Wired he said. To confirm that Perplexity has indeed broken its contents, Wired entered the titles or short descriptions of their articles into the company’s chatbot. The tool then responded with results that closely referred to their articles as “with minimal attribution.”
Recently Reuters the report claimed that Perplexity isn’t the only AI company it bypasses robots.txt files to collect content used to train large language models. However, Amazon’s research seems to focus only on Perplexity AI. Amazon spokeswoman said about it Wired customers should follow robots.txt guidelines when crawling websites. “AWS’ terms of service prohibit customers from using our services for any illegal activity, and customers are responsible for complying with our terms and all applicable laws,” they said.
Perplexity spokeswoman Sara Platnick said Wired the company has already responded to Amazon’s inquiries and denied that its browsers bypass the Robot Exclusion Protocol. “Our AWS-powered PerplexityBot respects robots.txt, and we’ve verified that perplexity-driven services do not crawl in any way that violates the AWS Terms of Service,” he said. Platnik admitted that PerplexityBot will ignore robots.text when a user enters a specific URL in a chatbot request.
Perplexity CEO Aravind Srinivas also previously denied that his company “ignored the Robot Exclusion Protocol and then lied about it.” Srinivas admitted Fast Company Perplexity itself uses third-party web browsers and bot Wired was one of those identified.