Modern robots.txt generator built for the AI era. Toggle which LLM bots (ChatGPT, Claude, Perplexity, Gemini, Apple Intelligence…) can scrape your site — without affecting standard search engines.
Take control of your site's AI visibility. While legacy robots.txt generators focus on search engine crawlers, the AI Crawler Firewall is built for the modern web. Select which specific Large Language Models (LLMs) and AI search engines can access your content. Toggle permissions for GPTBot, ClaudeBot, Google-Extended, and Perplexity, then instantly copy your optimized ruleset to protect your intellectual property or ensure AEO readiness.
AI & LLM Crawlers
Each row maps to one or more User-agent: lines in the output.
Trains ChatGPT, indexes pages for SearchGPT, and fetches links on user prompts.
Trains Claude and fetches pages for Claude.ai citations.
Crawls for Perplexity's answer engine and follows links during user queries.
Opts your site out of Gemini training + AI Overviews WITHOUT affecting normal Google search.
Internal Google crawls for non-search experiments and product R&D.
Opts out of Apple Intelligence training — separate from regular Applebot which powers Siri/Spotlight.
Trains Cohere's enterprise LLMs.
TikTok / Doubao model training crawler. Frequently flagged for ignoring robots.txt — block at the firewall too if you mean it.
Llama training. Separate from facebookexternalhit (link previews) which most sites want to keep allowed.
You.com answer engine crawler.
Knowledge-graph extraction. Resold to multiple LLM training pipelines.
Powers Alexa answers + Amazon's Q model.
Bulk web archive resold to training-data brokers.
Public web archive used as a base corpus by GPT-3/4, Llama, Falcon, and many open models.
Legacy Search Crawlers
Allowed by default. Block carefully — these still power the non-AI search results most users rely on.
Standard Google Search index. Blocking removes you from google.com — usually not what you want.
Bing index — also powers ChatGPT's web tool. Blocking has follow-on AI effects.
DuckDuckGo crawls partly via Bing — DuckDuckBot itself does a smaller verification crawl.
Russian-market search index.
Applied to every allowed crawler (including the catch-allUser-agent: *).