How Cited by LLMs scores AI visibility

Version methodology-2026-06-J. A transparent, explainable score — not yet a certification.

The model: Find, Read, Understand

The free check answers one question: can AI read your site? It measures the technical floor beneath understanding and recommendation, across three plain-language dimensions in the order AI retrieval actually happens:

Find — can AI discover and reach your site? (crawler access, robots.txt, sitemap)
Read — can AI actually read the content it fetches? (HTML content visibility, markdown negotiation, link headers)
Understand — can AI tell who you are? (structured data, title and description)

These three dimensions are the free check layer. The deeper question, will AI recommend you?, is what the paid /audit measures.

The gate: if AI can't reach or read you, nothing else matters

Two conditions are binary-fatal for AI visibility. If AI can't reach or read your site, the score craters regardless of how strong your structured data or sitemap are. A great title tag does not help if the crawler was never let in.

Crawler access (Find) — if the index-feeding AI crawlers are blocked in your robots.txt, you are not in the indexes that ChatGPT, Copilot, Gemini, and Perplexity search answers are built from.
Content visibility (Read) — AI crawlers do not execute JavaScript. A page that ships an empty HTML shell and fills itself client-side is invisible to AI, no matter how much content it eventually renders in a browser.

Fix either of these first. No other optimisation compensates for them.

/check vs /audit

/check (this tool, free) answers: can AI read you? It measures the on-page, controllable, technical floor: Find, Read, and Understand. A pass here means AI can reach, parse, and identify your site. It is necessary but not sufficient for AI visibility.

/audit (paid) answers: will AI recommend you? It measures the harder, mostly off-page signals that determine whether AI will actually cite and surface your business: brand-mention footprint, authority, content recency, real citation presence across ChatGPT, Claude, Gemini, and Perplexity. These cannot be measured in a free instant scan; they are earned over time.

The connection is real: you cannot be cited if you cannot be read. The free check is the honest floor; the audit is the full picture.

Headline weights

Each check carries a weight reflecting how much it matters for AI readability. Weights derive from the engine constants and sum to 100. The score is renormalized over only the checks we could actually verify, so we never penalize what we couldn't determine.

Check	Weight
ai_crawler_access	24
sitemap_declared	11
robots_txt	6
content_visibility	22
markdown_negotiation	5
link_headers	3
structured_data	20
identity_metadata	9

AI crawler taxonomy

Access scoring rewards the search and retrieval crawlers that feed AI answers and citations. Blocking a training-only crawler is a legitimate owner choice and never moves your score.

Index bots — blocking these in robots.txt removes you from the AI search indexes

OAI-SearchBot
Claude-SearchBot
Googlebot
PerplexityBot
Bingbot

User-fetch agents — these ignore robots.txt by design, so a robots.txt block has no effect (they fetch on demand anyway)

ChatGPT-User
Claude-User
Perplexity-User

Neutral bots — blocking them is a legitimate choice and never affects your score

GPTBot
ClaudeBot
CCBot
anthropic-ai
Google-Extended

What we don't score, and why

We are explicit about what is out of scope, because scoring things that don't matter to AI readability would mislead you.

llms.txt: 2026 evidence is clear: an Ahrefs study found roughly 97% of llms.txt files received zero crawler requests in a month. Google's own guidance says you don't need it. John Mueller called it "a dead end." Its practical use today is coding-agent tooling, not answer-engine visibility. Scoring it would give false signal about AI readability.
Content recency and freshness: A strong signal for getting cited (89.7% of AI-cited pages were updated in 2025), but that is a recommend signal. Most business sites legitimately do not publish content on a schedule. Recency is evaluated in the /audit.
Brand mentions, authority, and backlinks: Off-page, earned over time, and unmeasurable in a free instant scan. These drive whether AI will recommend you, not whether AI can read you. They belong in the /audit.
Web Bot Auth: A bot-authentication initiative (backed by Cloudflare, drawing broad industry interest) built on HTTP Message Signatures (RFC 9421). Web Bot Auth is the initiative; RFC 9421 is the signing mechanism it uses. They are not synonyms. It is a bot-signing scheme, not a site-side artifact we can scan. We track it as the emerging network-layer control point rather than score it; see the footnote below.

Footnote: robots.txt is advisory, and enforcement is moving

robots.txt reliably governs the index-feeding AI crawlers (the path ai_crawler_access measures), but it is advisory. User-fetch agents such as ChatGPT-User, Claude-User, and Perplexity-User ignore robots.txt by design and fetch your pages anyway. Third-party retrieval tools often route around it via proxy networks. Studies suggest roughly half of AI crawler traffic never requests robots.txt at all.

Real enforcement is moving to the network layer. Web Bot Auth is a Cloudflare-backed initiative, drawing broad industry interest, that lets servers cryptographically verify whether a crawler is who it claims to be. It is built on HTTP Message Signatures (RFC 9421). Web Bot Auth uses RFC 9421 as its signing mechanism; they are not synonyms. Because Web Bot Auth is a bot-side signing scheme rather than a site-side artifact, there is nothing on your site for us to scan today. We track it as the emerging control point and will incorporate it when site-side configuration becomes meaningful to check.