Robots.txt Validator
Validate your robots.txt and find crawl-blocking mistakes.
Get crawled, get indexed, get cited.
Sight AI publishes articles that crawlers can read instantly - clean HTML, schema markup, internal links, and AI-friendly structure. 7 free articles to start.
How it works
- 1
Enter any domain
We fetch /robots.txt directly from the root and parse it the way Googlebot does.
- 2
Review parsing errors
Stray colons, missing User-agent blocks, and other syntax mistakes that cost you indexing.
- 3
Check the AI crawler matrix
See whether GPTBot, ClaudeBot, PerplexityBot, and Google-Extended can read your site.
- 4
Verify your sitemap is declared
Add a Sitemap: directive at the bottom - it's the cheapest way to help search engines discover your URLs.
A small detail that compounds.
robots.txt is the first file every crawler asks for. A single malformed line can block your entire site from search engines or AI assistants - and you won't see the impact until your traffic disappears.
In 2026, robots.txt also gates whether you appear in ChatGPT, Claude, and Perplexity answers. Block the wrong bot and you get zero AI citations; allow them all and you get to compete.
Crawlable + cite-able + ranked.
A clean robots.txt is necessary but not sufficient. Once crawlers can reach you, you still need long-form, structured content that's actually worth ranking and citing.
Sight AI publishes articles purpose-built for both humans and AI: clean HTML, schema.org markup, server-rendered content, internal links, and the kind of clear claims that LLMs love to cite.
- Articles render fully without JavaScript (great for AI crawlers)
- Schema markup auto-generated for every article
- Internal linking optimized for crawl depth
- Built-in AI visibility tracking shows you which articles get cited
Common questions.
Should I block AI crawlers?
It's a real strategic question. Allowing them puts you in AI assistant answers (great for visibility) but also lets them train on your content. Most growth teams allow them; some publishers block training-only bots like CCBot while allowing live retrieval bots like GPTBot.
Where should robots.txt live?
Always at /robots.txt on the root of every host you serve. example.com/robots.txt and www.example.com/robots.txt are different files; both should exist.
Does robots.txt block indexing?
No - it blocks crawling. Pages can still get indexed via inbound links if you only Disallow them. To truly noindex, use a meta robots noindex tag and let Google crawl the page.
How big can robots.txt be?
Google reads up to 500 KB. Past that, the rest is ignored. Most healthy robots.txt files are under 5 KB.
Get 7 free articles with Sight AI
Sight AI writes long-form, SEO-optimized articles for you and tracks how AI assistants like ChatGPT and Claude see your brand. Create a free account to claim your 7 starter articles.
7 articles, AI visibility tracking, and our full publishing suite included.
More free SEO tools
Keep optimizing - every tool is free and runs in your browser.
Sitemap Validator
Check your XML sitemap for errors, broken URLs, and bloat.
Canonical Checker
Find and fix duplicate-content canonical issues.
Broken Link Checker
Find broken links on any page in seconds.
Sitemap Generator
Generate a clean XML sitemap from any URL.
URL Slug Generator
Create clean, SEO-friendly URL slugs instantly.
JSON ↔ CSV Converter
Convert JSON to CSV (and back) instantly in your browser.