# AI crawlability checker - Impetora free tool

> A free browser-based tool that generates a tailored 8-point checklist of what AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) need to discover and ingest a given domain. Browser CORS rules prevent us from scanning third-party origins directly, so the tool produces the exact URLs to verify and the third-party validators that confirm each item.

URL: https://impetora.com/tools/ai-crawlability-checker
Type: SoftwareApplication
Pricing: Free, no signup, no email gate

## The 8 checks

1. Does robots.txt allow GPTBot, ClaudeBot, PerplexityBot, and Google-Extended? - if you Disallow them, your content will not appear in AI-assistant answers.
2. Do you ship an llms.txt file at the site root? - the AI-equivalent of an XML sitemap, in plain text.
3. Do your key pages have Markdown twins? - parallel /md/<slug> routes serving text/markdown so assistants can fetch clean text.
4. Do your pages have JSON-LD structured data? - WebPage, Organization, Article, Product, FAQPage, BreadcrumbList give AI assistants high-trust extracted facts.
5. Is sitemap.xml present, valid, and current? - still the canonical machine-readable list of pages on your site.
6. Is your main content visible in the initial HTML, not only after JavaScript? - many AI crawlers do not execute JavaScript or cap their JS budget very low.
7. Are HTTPS, canonical URLs, and hreflang correct? - to avoid citation fragmentation across duplicates.
8. Are you pinging IndexNow on publish and update? - accelerates AI-assistant pickup from days to seconds.

## Real third-party validators

- Google Rich Results Test - https://search.google.com/test/rich-results
- Schema.org validator - https://validator.schema.org/
- Bing Webmaster Tools - https://www.bing.com/webmasters/about
- IndexNow protocol docs - https://www.indexnow.org/
- Google Search Console - https://search.google.com/search-console

## What this tool is not

It does not actually fetch your robots.txt, your sitemap, or any of your pages. Browsers are blocked from doing that by the same-origin policy, and we do not run server-side scans because real audits require deeper inspection than a one-shot HTTP GET. It is a guided self-audit and a printable checklist.

## Related

- llms.txt generator - https://impetora.com/tools/llms-txt-generator
- EU AI Act risk classifier - https://impetora.com/tools/eu-ai-act-classifier

## Need us to actually do the audit?

Submit a project at https://impetora.com/?source=ai-crawlability-checker#discovery-call
