Cloudflare Gives Creators New ToolCloudflare Gives Creators New Tool to Control Use of Their ContentNew Content Signals Policy will empower website owners and publishers to declare preferences on how AI companies access and use their content–available completely for freeCloudflare announced its latest way to help website owners and publishers gain more control over their content. Cloudflare will make it easy for any website owner to update their robots.txt—the simple text file that tells web crawlers what parts of a site they can or cannot access—with a new Content Signals Policy. This new policy will enable website operators to express preferences over how their data is used by others, including the ability to opt out of AI overviews and inference. The Internet is shifting from “search engines,” which provided a treasure map of links that a user could explore for information, to “answer engines” powered by AI, which give a direct answer without a user ever needing to click on the original site’s content. This severely threatens the original business model of the Internet, where websites, publishers, and content creators could earn money or fame by driving traffic and views to their site. Today, AI crawlers scrape vast troves of data from websites, but website operators have no way to express the nuances of whether, how, and for what purpose they may want to allow their content to be used. Robots.txt files allow website operators to specify which crawlers are allowed and what parts of a website they can access. It does not, however, let the crawler know what they are able to do with the content after accessing it. There needs to be a standard, machine-readable way to signal how data can be used even after it has been accessed. “The Internet cannot wait for a solution, while in the meantime, creators’ original content is used for profit by other companies,” said Matthew Prince, co-founder and CEO of Cloudflare. "To ensure the web remains open and thriving, we’re giving website owners a better way to express how companies are allowed to use their content. Robots.txt is an underutilized resource that we can help strengthen, and make it clear to AI companies that they can no longer ignore a content creator's preferences.” Cloudflare believes that an operator of a website, API, MCP server, or any Internet-connected service, whether they are a local news organization, AI startup, or an ecommerce shop, should get to decide how their data is used by others for commercial purposes. Today, more than 3.8 million domains use Cloudflare’s managed robots.txt service to express they do not want their content used for training. Now, Cloudflare's new Content Signals Policy will enable users to strengthen their robots.txt preferences with a clear set of instructions for anyone accessing the website via automated means, such as an AI crawler. The policy will now inform crawlers by:
While robots.txt files may not stop unwanted scraping, Cloudflare’s aim is that this improved policy language will better communicate a website owner’s preferences to bot operators, and drive companies to better respect content creator preferences. Starting today, Cloudflare will automatically update the robots.txt files to include this new policy language for all customers that ask Cloudflare to manage their robots.txt file. For anyone who wants to declare how crawlers can use their content via customized robots.txt files, Cloudflare is publishing tools to help. Organizations have seen the need for solutions like the Content Signals Policy, as a way to offer more direction over how their content is used:
Source: Cloudflare media announcement |