← Blog
Tutorial9 min read

Your Site Is on Cloudflare. AI Engines Might Never Index It.

We discovered today that Cloudflare's Managed robots.txt had silently blocked Claude-SearchBot 17 times — on a site built to monitor AI brand visibility. Here are the 7 Cloudflare settings that can make your brand invisible to AI search engines, and exactly how to fix each one.

C

Bryan

We found it by accident. While auditing Citany's own indexing health, Cloudflare's AI Crawl Control panel showed a single line that stopped everything: Claude-SearchBot — Allowed: 0, Unsuccessful: 17.

Anthropic's search crawler had tried to index Citany — a platform built specifically to monitor AI brand visibility — seventeen times. Every single attempt had failed silently. No error in our logs. No alert. Just seventeen invisible rejections, each one meaning Citany could not appear in Claude's search answers.

The cause was a single toggle in Cloudflare. One switch, enabled by default, that prepended a Disallow: / directive for ClaudeBot to our robots.txt — overriding everything we had written ourselves.

If you run your site on Cloudflare — and roughly 20% of the public web does — there are at least seven settings that can silently block AI search engines from ever indexing your content. Most of them are enabled by default. None of them show up in your server logs.

First: the distinction that makes all of this matter

Cloudflare splits AI bots into two fundamentally different categories, and conflating them is the root cause of most misconfigurations.

CategoryExamplesWhat they doBlock them?
AI Training CrawlersGPTBot, ClaudeBot, CCBot, BytespiderHarvest content to train LLMsYour call — blocking them does not affect AI search visibility
AI Search / Retrieval BotsOAI-SearchBot, Claude-SearchBot, PerplexityBot, ChatGPT-UserCrawl pages to answer live user queriesBlocking these makes you invisible in ChatGPT, Claude, and Perplexity answers

You can block GPTBot and ClaudeBot (training) all day long without affecting whether ChatGPT or Claude recommend your brand in their search answers. The bots that power those answers are completely different: OAI-SearchBot, Claude-SearchBot, and PerplexityBot. Block those, and you disappear from AI-generated answers entirely — regardless of how much content you publish.

Every misconfiguration below affects one or both categories. The ones marked critical affect AI search retrieval bots — the ones that decide whether your brand appears in AI answers.

Setting 1: Cloudflare Managed robots.txt Critical

Location: Security → Bots → AI Crawl Control → Robots.txt tab

When this toggle is on, Cloudflare generates and prepends its own managed block directives to the top of your robots.txt — before any rules you have written yourself.

The managed section looks like this:

# BEGIN Cloudflare Managed content
User-agent: ClaudeBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /
# END Cloudflare Managed Content

# Your custom rules below — already too late
User-Agent: ClaudeBot
Allow: /  ← this line is ignored

The robots.txt specification evaluates the first matching User-agent block. Because Cloudflare's block is prepended at the top, any Allow rules you write below it are simply never reached. Your site appears to be correctly configured. It is not.

This is what was blocking Claude-SearchBot from Citany for seventeen consecutive attempts. Our robots.ts had correctly whitelisted ClaudeBot. Cloudflare overrode it silently.

Fix

Go to AI Crawl Control → Robots.txt and disable the Cloudflare managed toggle. Then verify your own robots.txt correctly allows the bots you want. Use Bing Webmaster Tools "Live URL" or Google Search Console "URL Inspection" to confirm the current live robots.txt content.

Setting 2: Block AI Bots toggle Critical

Location: Security → Bots → "Block AI Bots" toggle

Cloudflare introduced this one-click toggle in July 2025 under the headline "Declaring AIndependence." Marketing aside, the actual effect is a blanket block on all verified AI bots — training crawlers and search/retrieval bots.

Enabling it blocks OAI-SearchBot (ChatGPT Browse), Claude-SearchBot (Claude search), and PerplexityBot — the exact bots responsible for surfacing your brand in AI-generated answers. Cloudflare also added that new zones created after July 1, 2025 have this enabled by default.

If you registered your domain or added it to Cloudflare after mid-2025, check this setting first.

Fix

Set to "Do not block." Then use AI Crawl Control to selectively block only the training crawlers you want to restrict (GPTBot, ClaudeBot, CCBot), while leaving search bots on Allow.

Setting 3: Bot Fight Mode / Super Bot Fight Mode High Risk

Location: Security → Bots → Bot Fight Mode (Free) or Super Bot Fight Mode (Pro+)

Cloudflare maintains an internal list of "verified bots" — crawlers it has confirmed are legitimate. Googlebot and Bingbot are on this list. GPTBot, ClaudeBot, OAI-SearchBot, and PerplexityBot are not.

This matters because Bot Fight Mode classifies unverified automated traffic as "definitely automated" and issues it a Managed Challenge or JS Challenge. Bots cannot complete JavaScript challenges. A JS challenge is a block in practice.

The Free plan version of Bot Fight Mode cannot be customized at all. You cannot create exceptions for specific user agents. If it is blocking PerplexityBot, your only options are to turn it off entirely or upgrade.

On Pro and Business plans, Super Bot Fight Mode is configurable. You can create WAF Custom Rules with a Skip action that exempts specific bots before Bot Fight Mode logic fires.

Fix (Pro+ plans)

http.user_agent contains "OAI-SearchBot" or http.user_agent contains "Claude-SearchBot" or http.user_agent contains "PerplexityBot"

Create a WAF Custom Rule with this expression and action: Skip → Skip all Super Bot Fight Mode rules. Place it above all other custom rules.

Setting 4: WAF Managed Rule — "Manage AI Bots" High Risk

Location: Security → WAF → Managed Rules

This is one of the more insidious settings because it acts independently from AI Crawl Control. You can set a bot to "Allow" in AI Crawl Control and still have it blocked at the WAF layer — returning a 403 that never appears in your normal traffic logs.

The WAF Managed Rule for AI bots executes before AI Crawl Control's enforcement logic. If the managed rule is active, it fires first and the Allow setting in AI Crawl Control is never reached.

There are also reports of Cloudflare Pages custom domains applying this rule invisibly — blocking AI crawlers with a 403 that does not appear in Security Events analytics, making diagnosis nearly impossible without specifically checking the WAF managed rules list.

Fix

Go to Security → WAF → Managed Rules and search for AI-related rules. Disable the "Manage AI bots" rule, or create a WAF Custom Rule with Skip action placed above it for the specific bot user agents you want to allow.

Setting 5: WAF Custom Rules matching "bot" in user agent

Location: Security → WAF → Custom Rules

A common security pattern is to block requests where the user agent contains the word "bot". The intention is to stop scrapers. The side effect is that GPTBot, ClaudeBot, PerplexityBot, and any other AI crawler with "bot" in its user agent string gets caught in the same rule.

Similarly, rules that block datacenter IP ranges can silence AI crawlers, since OpenAI, Anthropic, and Perplexity all crawl from their own ASNs. If your rule blocks "non-residential IPs" or specific ASN ranges, check whether those ranges include AI crawler networks.

Fix

Audit all WAF Custom Rules for expressions that match "bot", "crawler", or "spider" in user agent fields. Add a Skip rule at the top of your rule list for the specific bot user agents you want to allow, so they are exempted before any block rules fire.

Setting 6: Security Level — "I'm Under Attack" mode

Location: Security → Settings → Security Level

Under Attack mode issues a JavaScript challenge to every visitor before serving any content. Bots cannot complete this challenge. Cloudflare states that known legitimate bots are exempted — but "known legitimate" means the verified bots list, which does not include most AI crawlers.

This is rarely left on permanently, but it is worth checking if your site went through a DDoS event and Under Attack mode was never turned off afterward.

Fix

Set Security Level to Medium or below. Use Under Attack mode only during active DDoS incidents, not as a permanent setting.

Setting 7: Rate Limiting without bot exemptions

Location: Security → WAF → Rate Limiting Rules

AI training crawlers can make thousands of requests per hour. Strict rate limiting rules — for example, 20 requests per minute — will throttle or block them with a 429. Legacy Cloudflare rate limiting automatically exempts known search engine bots, but AI crawlers are not classified as search engines in this context.

The result is intermittent blocking that is hard to diagnose: the bot gets through some of the time, but never completes a full crawl. You see partial indexing and assume it is working fine.

Fix

not (http.user_agent contains "OAI-SearchBot" or http.user_agent contains "PerplexityBot" or http.user_agent contains "Claude-SearchBot")

Add this condition to your rate limiting rules so AI search bots are excluded from the limit.

The full diagnostic checklist

Run through this list in order. Each item takes under two minutes to check.

#SettingLocationCorrect state
1Managed robots.txtAI Crawl Control → Robots.txtOFF, or configured to allow search bots
2Block AI Bots toggleSecurity → BotsOFF, or set to training-only
3Bot Fight Mode (Free) / Super Bot Fight ModeSecurity → BotsSkip rules added for OAI-SearchBot, Claude-SearchBot, PerplexityBot
4WAF Managed Rule — Manage AI BotsSecurity → WAF → Managed RulesDisabled, or skipped for search bots
5WAF Custom Rules matching 'bot'Security → WAF → Custom RulesSkip rule at top for search bot user agents
6Security LevelSecurity → SettingsMedium or below
7Rate LimitingSecurity → WAF → Rate Limiting RulesAI search bots excluded from rate limits

How to verify it is actually working

Configuration changes are not enough. You need to confirm the bots are actually getting through.

  1. Cloudflare AI Crawl Control → Crawlers tab: Check the Allowed and Unsuccessful counts for each search bot. Unsuccessful should be 0 for OAI-SearchBot, Claude-SearchBot, and PerplexityBot.
  2. Cloudflare Security → Analytics → Security Events: Filter by user agent for specific bot names. Any challenges or blocks logged here indicate the bot is still being intercepted.
  3. Bing Webmaster Tools → URL Inspection → Live URL: If Bingbot can fetch your page cleanly, most other bots using standard headers will be able to as well.
  4. Cloudflare AI Crawl Control → Robots.txt → Violations: Shows which bots are hitting disallowed paths. Unexpected violations here reveal misconfigured rules.

The invisible problem

What makes these misconfigurations so damaging is that nothing breaks visibly. Your site loads fine. Google keeps indexing you. Your logs show no errors. But AI engines are silently failing to crawl your content — and you are invisible in the answers they generate for your potential customers.

According to Cloudflare's own data, Perplexity generates 194 crawls per human visitor it sends back. OAI-SearchBot generates 1,091 crawls per referral. These are crawlers that directly influence AI-generated brand recommendations. Blocking them does not protect your content — it just removes you from the conversation.

Run the checklist. It takes ten minutes. The alternative is spending months creating content that AI engines are systematically prevented from ever reading.

Found this useful?

Citany monitors how ChatGPT, Perplexity, Claude, and five other AI engines mention your brand. If you want to know whether fixing these settings actually moved the needle — that is exactly what we track. See how it works →

Stay updated

Follow the journey on X

Weekly threads with raw data, hypothesis tests, and the honest story of building AI visibility from zero.