top of page

Robots.txt Generator

How to Block AI Crawlers in robots.txt (Complete 2026 Guide)

A robots.txt generator creates a file that blocks AI crawlers like GPTBot & ClaudeBot from scraping your site — free, no coding needed.

Ajitesh Agarwal

Marcitors free robots.txt generator interface showing AI crawler blocking options.

AI companies are quietly scraping your website right now — training their models on your hard-earned content, often without permission. Here's how to stop them using your robots.txt file, and the free robots.txt generator tool that makes it dead simple.


Table of Contents

  1. What are AI crawlers and why should you block them?

  2. What is a robots.txt file?

  3. Complete list of AI crawlers to block in 2026

  4. How to block AI crawlers step-by-step

  5. Use a free robots.txt generator (no coding needed)

  6. Selective blocking: block training, allow AI search


What Are AI Crawlers — and Why Should You Block Them?

Every time you publish content online, dozens of automated bots visit your pages. Some are search engine crawlers like Googlebot, which help your site rank in search results. But a new breed of bots has emerged: AI training crawlers sent by companies like OpenAI, Anthropic, Meta, and ByteDance.


These AI crawlers don't send you traffic. They scrape your text, images, and code to train large language models (LLMs) — commercial AI products worth billions of dollars. Your content becomes their training data. You receive nothing in return.


⚠️ Did you know?

Major publishers like The New York Times, Reuters, and the Wall Street Journal have already blocked AI crawlers. In June 2025, AI bots accessed around 39% of the top one million internet properties — but only 2.98% had taken steps to block them.

The good news: you can stop them in minutes using your robots.txt file—or even faster with a free robots.txt generator.


2. What Is a robots.txt File?

A robots.txt file is a plain text file that sits at the root of your website (e.g., yoursite.com/robots.txt). It tells web crawlers which pages or sections of your site they are—and aren't—allowed to access.


It works using two simple instructions:


User-agent — specifies which bot you're targeting

Disallow — tells the bot what it cannot access

A basic robots.txt file that blocks all crawlers from your entire site looks like this:


robots.txt


# Block all bots from everything

User-agent: *

Disallow: /

💡 Important note

Blocking all bots with User-agent: * would also block Google — which would destroy your SEO. You need targeted rules that block AI training crawlers specifically, while keeping Googlebot and Bingbot free to crawl your site.

3. Complete List of AI Crawlers to Block in 2026

Here is the up-to-date list of every major AI crawler, who operates it, and what it does with your content:


User-Agent

Company

Purpose

Recommendation

GPTBot

OpenAI

Trains GPT models

Block

ChatGPT-User

OpenAI

Real-time browsing in ChatGPT

Optional

OAI-SearchBot

OpenAI

Powers ChatGPT Search

Optional

ClaudeBot

Anthropic

Trains Claude AI

Block

anthropic-ai

Anthropic

Anthropic data collection

Block

Claude-Web

Anthropic

General web crawling

Block

Google-Extended

Google

Trains Gemini AI (not search)

Block

PerplexityBot

Perplexity

AI search training

Optional

Bytespider

ByteDance (TikTok)

Trains Doubao LLM

Block

CCBot

Common Crawl

Dataset used to train GPT-3

Block

FacebookBot

Meta

Meta AI training

Block

meta-externalagent

Meta

Meta AI models

Block

cohere-ai

Cohere

Cohere model training

Block

Applebot-Extended

Apple

Trains Apple Intelligence

Block



Note: "Optional" means blocking these bots may prevent your content from appearing in AI-powered search results like ChatGPT Search or Perplexity. Block them only if you don't want that exposure.

4. How to Block AI Crawlers Step-by-Step

1.Find your robots.txt file

Go to yourwebsite.com/robots.txt in your browser. If you see a file, it already exists. If you get a 404 error, you need to create one in your website's root directory.


2.Generate your robots.txt using a free tool

The fastest way is to use a free robots.txt generator like Marcitors. No coding required — just select which bots to block, and it automatically creates the correct syntax for you. Skip to Section 5 to see how.


3.Add AI crawler block rules

Copy the code below into your robots.txt file. This blocks all major AI training crawlers while keeping Google Search (Googlebot) fully active.


4.Upload and test

Save the file and upload it to your website's root directory. Then verify it works by visiting yoursite.com/robots.txt and checking it with Google Search Console's robots.txt tester.


Here is the complete robots.txt code to block all major AI crawlers in 2026:


# Allow Google and Bing (keep your SEO intact)
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# Block OpenAI
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

# Block Anthropic (Claude)
User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Claude-Web
Disallow: /

# Block Google AI Training (keeps Google Search intact)
User-agent: Google-Extended
Disallow: /

# Block Meta / Facebook
User-agent: FacebookBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

# Block ByteDance (TikTok)
User-agent: Bytespider
Disallow: /

# Block Common Crawl
User-agent: CCBot
Disallow: /

# Block Apple AI
User-agent: Applebot-Extended
Disallow: /

# Block Cohere
User-agent: cohere-ai
Disallow: /

# Block Perplexity
User-agent: PerplexityBot
Disallow: /

# Sitemap location
Sitemap: https://yourwebsite.com/sitemap.xml

✅ Good news for your SEO

Blocking Google-Extended does NOT affect your Google Search rankings or your ability to appear in Google's AI Overviews (SGE). It only blocks your content from being used to train Gemini AI. Multiple studies have confirmed zero ranking impact from blocking this bot.

5. Use a Free robots.txt Generator (No Coding Needed)

If manually editing code feels intimidating, don't worry — you don't have to write a single line. A free robots.txt generator does all the heavy lifting for you.


Marcitors offers one of the best free robots.txt generator tools available in 2026. Here's how to use it:


1.Go to Marcitors Free Tools

Visit marcitors.com/free-tools and select the Robots.txt Generator. No account or sign-up required.


2.Enter your website URL and sitemap

The tool will automatically include your sitemap URL in the output file — a small but important SEO detail that many manual writers forget.


3.Select which crawlers to block

Toggle on the AI crawlers you want to block. The robots.txt generator writes the correct syntax automatically — no typos, no formatting errors.


4.Download and upload to your site

Download the generated robots.txt file and upload it to the root directory of your website. Done — your content is now protected.

Generate Your robots.txt File for Free


No coding. No sign-up. Protect your content from AI scrapers in under 2 minutes.

Try Marcitors Free robots.txt Generator →

100% free · No account required · AI-ready output


6. Selective Blocking — Block Training, Allow AI Search

You don't have to go all-or-nothing. If you want your content to appear in AI search results like ChatGPT Search or Perplexity, but you don't want your content used to train AI models, you can use selective blocking.


# Allow AI Search bots (your content appears in ChatGPT, Perplexity)
User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

# Block AI Training bots (content NOT used to train models)
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /


This is the strategy many savvy content creators and publishers use in 2026 — stay visible in AI-powered search while protecting your intellectual property from being used as training data.



Blocking AI crawlers from your website is one of the smartest technical SEO moves you can make in 2026. Your content is your most valuable asset — don't let AI companies profit from it for free.


The process is straightforward: add the right rules to your robots.txt file, or better yet, use a free robots.txt generator to create the perfect file in under two minutes with zero coding required.


Start with Marcitors' free robots.txt generator—it's one of the most beginner-friendly tools available, and it's completely free.

Subscribe to our newsletter

Other Categories

Abstract Curved Shapes
Ajitesh Agarwal

Ajitesh Agarwal

Ajitesh Agarwal is a business intelligence and analytics specialist with a focus on data strategy, reporting automation, and insight delivery. He supports organizations in adopting modern BI platforms and scalable analytics frameworks. His work emphasizes clarity, accuracy, and actionable intelligence.

LinkedIn
Network Of Diverse People

Building Authority

Expert strategies, trends, and data-driven insights to improve rankings, understand your audience, and drive measurable digital performance.

bottom of page