Would You like a feature Interview?
All Interviews are 100% FREE of Charge
Amazon Web Services has reportedly launched an investigation into whether Perplexity AI is violating its rules. WiredTo be precise, the company’s cloud division is investigating allegations that crawlers hosted on its servers are ignoring the Robots Exclusion Protocol, a web standard where developers place robots.txt files on their domains with instructions on whether bots can access certain pages. Following these instructions is optional, but crawlers from reputable companies have generally respected them since web developers began implementing the standard in the ’90s.
In a previous article, Wired report The company said it had discovered a virtual machine that was circumventing robots.txt directives on its websites. The machine was hosted on an Amazon Web Services server with the IP address 44.221.181.252 and was “surely operated by Perplexity.” The machine had reportedly visited other Condé Nast sites hundreds of times over the past three months and scraped their content as well. of Guardian, Forbes and The New York Times We also detected multiple visits to their publications. Wired To verify whether Perplexity was really scraping content, Wired Users entered the headline and a short description of an article into the company’s chatbot, and the tool returned a mostly paraphrased version of the article, “with minimal citations.”
Recent Reuters The report claimed that Perplexity is not the only AI company that circumvents robots.txt files to gather content used to train large-scale language models. However, Amazon’s investigation appears to focus solely on Perplexity AI. An Amazon spokesperson said: Wired The company said its customers must follow robots.txt instructions when crawling websites. “AWS’ Terms of Service prohibit customers from using our services for illegal activity, and our customers are responsible for complying with our terms and all applicable laws,” the company said.
“Perplexity is a game changer,” said Perplexity spokeswoman Sarah Platnick. Wired The company has already responded to Amazon’s inquiries, denying that its crawlers circumvent the robots exclusion protocol: “Our PerplexityBot, which runs on AWS, respects robots.txt, and we have verified that Perplexity-managed services do not crawl sites in a way that violates AWS’s terms of service,” she said. But she acknowledged that PerplexityBot ignores robots.txt if a user includes a specific URL in the chatbot’s query.
Perplexity CEO Aravind Srinivas also previously denied that his company was “ignoring robot exclusion protocols and lying about them.” Fast Company Perplexity uses its own web crawlers as well as third-party web crawlers. The bots Wired One of them was identified.