Robots.txt Validator

Validate your robots.txt and find crawl-blocking mistakes.

Get crawled, get indexed, get cited.

Sight AI publishes articles that crawlers can read instantly - clean HTML, schema markup, internal links, and AI-friendly structure. 7 free articles to start.

Get 7 free articles

Walkthrough

How it works

1
Enter any domain
We fetch /robots.txt directly from the root and parse it the way Googlebot does.
2
Review parsing errors
Stray colons, missing User-agent blocks, and other syntax mistakes that cost you indexing.
3
Check the AI crawler matrix
See whether GPTBot, ClaudeBot, PerplexityBot, and Google-Extended can read your site.
4
Verify your sitemap is declared
Add a Sitemap: directive at the bottom - it's the cheapest way to help search engines discover your URLs.

Why it matters

A small detail that compounds.

robots.txt is the first file every crawler asks for. A single malformed line can block your entire site from search engines or AI assistants - and you won't see the impact until your traffic disappears.

In 2026, robots.txt also gates whether you appear in ChatGPT, Claude, and Perplexity answers. Block the wrong bot and you get zero AI citations; allow them all and you get to compete.

With Sight AI

Crawlable + cite-able + ranked.

A clean robots.txt is necessary but not sufficient. Once crawlers can reach you, you still need long-form, structured content that's actually worth ranking and citing.

Sight AI publishes articles purpose-built for both humans and AI: clean HTML, schema.org markup, server-rendered content, internal links, and the kind of clear claims that LLMs love to cite.

Articles render fully without JavaScript (great for AI crawlers)
Schema markup auto-generated for every article
Internal linking optimized for crawl depth
Built-in AI visibility tracking shows you which articles get cited

FAQ

Common questions.

Should I block AI crawlers?

It's a real strategic question. Allowing them puts you in AI assistant answers (great for visibility) but also lets them train on your content. Most growth teams allow them; some publishers block training-only bots like CCBot while allowing live retrieval bots like GPTBot.

Where should robots.txt live?

Always at /robots.txt on the root of every host you serve. example.com/robots.txt and www.example.com/robots.txt are different files; both should exist.

Does robots.txt block indexing?

No - it blocks crawling. Pages can still get indexed via inbound links if you only Disallow them. To truly noindex, use a meta robots noindex tag and let Google crawl the page.

How big can robots.txt be?

Google reads up to 500 KB. Past that, the rest is ignored. Most healthy robots.txt files are under 5 KB.

7 free articles included

Get 7 free articles with Sight AI

Sight AI writes long-form, SEO-optimized articles for you and tracks how AI assistants like ChatGPT and Claude see your brand. Create a free account to claim your 7 starter articles.

Create free account Browse more free tools

7 articles, AI visibility tracking, and our full publishing suite included.

More free SEO tools

Keep optimizing - every tool is free and runs in your browser.

View all tools

Sitemap Validator

Check your XML sitemap for errors, broken URLs, and bloat.

Canonical Checker

Find and fix duplicate-content canonical issues.

Broken Link Checker

Find broken links on any page in seconds.

Sitemap Generator

Generate a clean XML sitemap from any URL.

URL Slug Generator

Create clean, SEO-friendly URL slugs instantly.

JSON ↔ CSV Converter

Convert JSON to CSV (and back) instantly in your browser.

Visibility

Content

Indexing

AI Agents