AI startup Perplexity is crawling and scraping content material from web sites which have explicitly indicated they don’t wish to be scraped, in response to web infrastructure supplier Cloudflare.
On Monday, Cloudflare published research saying it noticed the AI startup ignore blocks and conceal its crawling and scraping actions. The community infrastructure big accused Perplexity of obscuring its id when attempting to scrape net pages “in an try to bypass the web site’s preferences,” Cloudflare’s researchers wrote.
AI merchandise like these provided by Perplexity depend on gobbling up massive quantities of knowledge from the web, and AI startups have lengthy scraped textual content, photographs, and movies from the web many instances with out permission to make their merchandise work. In current instances, web sites have tried to struggle again by utilizing the online customary Robots.txt file, which tells engines like google and AI corporations which pages may be listed and which shouldn’t, efforts that have seen mixed results so far.
Perplexity seems to be willingly circumventing these blocks by altering its bots’ “person agent,” which means a sign that identifies a web site customer by their system and model kind, in addition to altering their autonomous system networks, or ASN, basically a quantity that identifies massive networks on the web, in response to Cloudflare.
“This exercise was noticed throughout tens of hundreds of domains and thousands and thousands of requests per day. We had been in a position to fingerprint this crawler utilizing a mix of machine studying and community alerts,” learn Cloudflare’s put up.
Perplexity spokesperson Jesse Dwyer dismissed Cloudflare’s weblog put up as a “gross sales pitch,” including in an e mail to TechCrunch that the screenshots within the put up “present that no content material was accessed.” In a follow-up e mail, Dwyer claimed the bot named within the Cloudflare weblog “isn’t even ours.”
Cloudflare mentioned it first observed the habits after its clients complained that Perplexity was crawling and scraping their websites, even after they added guidelines on their Robots file and for particularly blocking Perplexity’s identified bots. Cloudflare mentioned it then carried out checks to test and confirmed that Perplexity was circumventing these blocks.
Techcrunch occasion
San Francisco
|
October 27-29, 2025
“We noticed that Perplexity makes use of not solely their declared user-agent, but additionally a generic browser meant to impersonate Google Chrome on macOS when their declared crawler was blocked,” in response to Cloudflare.
The corporate additionally mentioned that it has de-listed Perplexity’s bots from its verified listing and added new methods to dam them.
Cloudflare has not too long ago taken a public stance towards AI crawlers. Final month, Cloudflare announced the launch of a marketplace permitting web site homeowners and publishers to cost AI scrapers who go to their websites. Cloudflare’s chief govt Matthew Prince sounded the alarm on the time, saying AI is breaking the enterprise mannequin of the web, significantly publishers. Final 12 months, Cloudflare additionally launched a free tool to stop bots from scraping web sites to coach AI.
This isn’t the primary time Perplexity is accused of scraping with out authorization.
Final 12 months, information shops, such as Wired, alleged Perplexity was plagiarizing their content. Weeks later, Perplexity’s CEO Aravind Srinivas was unable to immediately answer when requested to supply the corporate’s definition of plagiarism throughout an interview with TechCrunch’s Devin Coldewey on the Disrupt 2024 convention.