What Is ai.txt? The New Standard for AI Crawler Permissions

As artificial intelligence reshapes the internet, website owners face a new challenge: controlling how AI companies use their content. The traditional robots.txt file was designed for search engine crawlers in the 1990s and lacks the nuance needed for today's AI landscape. Enter ai.txt — an emerging standard that gives publishers precise control over AI interactions with their content.

The Problem with robots.txt for AI

The robots.txt protocol was created in 1994 to help webmasters control search engine crawlers. It works on a simple principle: allow or disallow specific User-Agents from accessing specific URL paths. But AI usage introduces complexities that robots.txt was never designed to handle.

When you allow GPTBot to crawl your website via robots.txt, you are granting broad access. But what does that access mean? Can OpenAI use your content to train future AI models? Can it reproduce your text verbatim in ChatGPT responses? Can it use your images to train image generation models? robots.txt provides no way to express these distinctions. It is a binary on/off switch for crawl access, nothing more.

This gap has created real problems. Publishers who want their content discoverable by AI assistants (to drive referral traffic) do not necessarily want their articles used as training data for future AI models. Content creators want AI systems to cite and reference their work, not absorb it into a model where attribution becomes impossible.

What Is ai.txt?

The ai.txt file is a proposed standard that sits alongside robots.txt in your website's root directory. While robots.txt controls crawl access (can a bot visit your pages?), ai.txt controls content usage (what can an AI company do with your content after visiting?).

The file is placed at the root of your domain — for example, https://yoursite.com/ai.txt — and uses a human-readable format to declare your preferences for AI usage. Think of it as a machine-readable content license specifically for artificial intelligence.

A basic ai.txt file might look like this:

# ai.txt — AI Usage Preferences
# Learn more at https://site.spawning.ai/ai-txt

User-Agent: *
Disallow: Training
Allow: Search
Allow: Summarization

This example tells AI companies: you may use my content for search and summarization (appearing in AI-powered search results), but you may not use it to train AI models.

Key Concepts in ai.txt

Training refers to the use of your content as part of a dataset for training or fine-tuning AI models. When you disallow training, you are saying that your content should not become part of a model's learned knowledge. This is the most contentious use case — many publishers object to their content being absorbed into models without compensation or control.

Search and Retrieval refers to AI systems fetching and citing your content in real-time to answer user queries. This is similar to how Google shows snippets from websites in search results, but for AI-powered platforms. Most publishers welcome this because it drives traffic back to their sites.

Summarization allows AI systems to create summaries of your content. This is a middle ground — your content is being processed and condensed, but the AI is referencing your specific page rather than drawing from trained knowledge.

Generation refers to AI systems using your content style, structure, or specific elements to generate new content. Disallowing generation prevents your creative work from being directly replicated or closely imitated.

How ai.txt Works with robots.txt

The two files serve complementary purposes and should be used together. robots.txt remains the gatekeeper for crawl access — if you block a bot in robots.txt, it should never reach your content in the first place, making ai.txt irrelevant for that bot.

For bots you do allow to crawl, ai.txt provides the next layer of control. A practical configuration might look like this:

In your robots.txt, allow GPTBot, ClaudeBot, and other AI crawlers to access your pages. Then in your ai.txt, specify that crawling for search and summarization is permitted, but training is not. This gives you the best of both worlds: your content appears in AI-powered answers (driving traffic), but it is not absorbed into training datasets.

Current Adoption and Limitations

As of early 2026, ai.txt is still an emerging standard. It was proposed by Spawning AI and has gained traction among publishers and AI companies, but it is not yet universally adopted. Major AI companies have varying levels of commitment to respecting ai.txt directives.

OpenAI has introduced its own mechanisms for content usage preferences, and Google has its own controls through Google Search Console. The landscape is fragmented, with different companies offering different opt-out mechanisms. ai.txt aims to unify these into a single, standardized file that all AI companies can read.

The legal enforceability of ai.txt is also evolving. Unlike robots.txt, which has decades of established practice and has been referenced in legal proceedings, ai.txt is newer and its legal standing is less clear. However, having a clear, machine-readable declaration of your content usage preferences strengthens your position in any dispute about unauthorized AI usage.

How to Create Your ai.txt File

Creating an ai.txt file is straightforward. Create a plain text file named ai.txt and place it in your website's root directory so it is accessible at yoursite.com/ai.txt.

Start by deciding your content usage preferences. Ask yourself these questions: Do you want AI models to be trained on your content? Do you want your content to appear in AI-powered search results? Are you comfortable with AI systems summarizing your pages? Do you want to allow or restrict specific AI companies?

For most website owners who want maximum visibility while protecting their content, this configuration works well:

User-Agent: *
Disallow: Training
Allow: Search
Allow: Summarization

User-Agent: GPTBot
Allow: Search
Disallow: Training

User-Agent: ClaudeBot
Allow: Search
Disallow: Training

After creating the file, verify it is accessible by visiting your domain with /ai.txt appended. CheckMy.site automatically checks for the presence and validity of ai.txt as part of its AI visibility scan, letting you know if AI crawlers can find and read your preferences.

The Future of AI Content Permissions

The relationship between website publishers and AI companies is still being defined. New standards, regulations, and industry agreements are emerging rapidly. The EU AI Act, various copyright lawsuits, and industry-led initiatives are all shaping how AI companies must handle web content.

What is clear is that website owners need tools to express their preferences, and AI companies need machine-readable signals to respect them. Whether ai.txt becomes the definitive standard or evolves into something else, having a clear declaration of your AI content preferences is increasingly important.

The websites that proactively manage their AI presence today — by configuring robots.txt for crawl access and ai.txt for usage permissions — will be best positioned as the AI ecosystem matures. Start by scanning your site with CheckMy.site to understand your current AI visibility status, then implement the appropriate controls for your content strategy.

What Is ai.txt? The New Standard for AI Crawler Permissions

The Problem with robots.txt for AI

What Is ai.txt?

Key Concepts in ai.txt

How ai.txt Works with robots.txt

Current Adoption and Limitations

How to Create Your ai.txt File

The Future of AI Content Permissions

Check Your Website Now

Keep Reading

AI-Generated Content and Originality: What Search and AI Actually Reward

How to Verify a Crawler Is Really GPTBot or Googlebot (Not a Fake)

Vector Embeddings and Semantic Search: How AI Retrieves Your Content