More and more organizations are asking: how do I prevent AI companies from using my content without permission or compensation?
Cloudflare introduces a solution that allows you, the website owner, to regain control of your own content.
What is the problem?
AI models such as GPT from OpenAI and Claude from Anthropic are trained on huge amounts of online content. They use so-called AI crawlers for this purpose. Those bots extract text, images and other information from Web sites without any traffic or revenue.
Cloudflare did research on the relationship between the number of times an AI bot visits your site and the number of times someone returns to your site via that AI.
- With OpenAI, that ratio is 1,700 to 1
- At Anthropic even 73,000 at 1
In short, they take a lot, but give almost nothing back.
Why is this a problem?
For many websites, content is worth money. Think of blogs with ads or platforms that rely on traffic. If AI companies use your content without permission or compensation, it directly affects your revenue model. Moreover, you have no control over what your content is used for.
What can you do?
Cloudflare now offers tools that allow you to block or regulate this behavior. You choose whether and how AI crawlers can use your content.
Key features:
- Managed robots.txt
Cloudflare automatically manages your site’s robots.txt file. In it are rules that determine which bots can access which pages. You can set it to exclude AI crawlers. - Protecting ad revenue
AI bots can specifically bar you from revenue-generating pages, such as blogs with ads. - Dashboard for AI traffic
The Cloudflare dashboard gives you insight into which AI crawlers are active, on which pages, and how much traffic they generate. - Set simple blocks
You can temporarily or permanently block AI bots with one click, or allow them under certain conditions. - Reporting for compliance
You can generate reports to monitor whether AI companies are complying with your set rules.
Is this already active?
No, Cloudflare has not yet turned on this protection by default. You have to activate it yourself through the dashboard. So by default, it is still open.
What are the benefits?
- You’re back in control of what happens to your content
- You can prevent AI companies from profiting from your work for free
- You protect your income and reputation
- You may soon be able to get paid for accessing your content (Cloudflare is working on a “pay-per-crawl” system)
Are there any drawbacks?
- It requires some technical knowledge to set up properly
- It can block bots that do bring you traffic, if you don’t filter properly
- It is not a conclusive solution against all forms of scraping
Want setup support or insight into how your content is currently being used by AI? Get in touch with us. We’d be happy to help.