Include llms.txt and llms-full.txt for LLM Discovery¶
Context¶
A significant fraction of this guide's intended readership consists of large-language-model agents — Claude Code, Cursor, GitHub Copilot, local code assistants — that consult the guide on behalf of a human engineer. These agents do not behave like a human reader who browses the sidebar, follows links, and accumulates context across multiple chapter visits. They ingest documentation in single passes, prefer small index files for navigation, and prefer one long concatenated file when grounding a complete answer.
The community convention for this access pattern is llms.txt, a
proposed format documented at https://llmstxt.org/. The format
specifies an H1 title, a > description block, and H2 sections each
containing a list of [name](url): summary lines. A complementary
llms-full.txt carries the concatenated full text of every page in
sidebar order so that an agent that wants the whole guide MAY fetch a
single file rather than crawl N pages.
Including both files at the repository root and at the published site
root costs little (a small index plus a generated concatenation) and
pays back in two concrete ways. First, agents that respect the
convention MAY discover the guide cleanly without scraping. Second, the
hierarchical llms.txt index doubles as a stable, machine-readable
table of contents that survives sidebar reorganisations: an external
tool that wants to link to "the Surface chapter" MAY resolve the link
through llms.txt rather than hardcoding a sidebar position.
The alternatives considered include omitting the files entirely (let
agents crawl), shipping only llms.txt without llms-full.txt
(forcing per-page crawls for full grounding), and shipping only
llms-full.txt without llms.txt (no hierarchical entry point). Each
fails on one of the two payback axes above.
Decision Drivers¶
- The guide MUST be ingestible by LLM agents in a single pass without per-page crawling.
- The guide SHOULD provide a hierarchical, machine-readable table of contents that survives sidebar reorganisations.
- The discovery layer SHOULD follow an emerging community convention rather than a bespoke format so that agents and tooling MAY adopt it without per-site customisation.
- The discovery layer MUST be reproducible from the source markdown so that it does not drift from the rendered site.
- The discovery layer MUST live at the repository root and the published site root so that agents that probe well-known paths MAY find it.
Considered Options¶
- Ship both
llms.txt(hierarchical index) andllms-full.txt(concatenated full content) - Ship only
llms.txt(index, no concatenation) - Ship only
llms-full.txt(concatenation, no index) - Ship neither — let LLM agents crawl the rendered site
Decision Outcome¶
Chosen option: "Ship both llms.txt and llms-full.txt", because it
is the only option that serves both discovery patterns: agents that
want a hierarchical entry point MAY fetch llms.txt, and agents that
want the whole guide for grounding MAY fetch llms-full.txt as a
single file. llms.txt MUST follow the format proposed at
https://llmstxt.org/ (H1 title, > description, H2 sections,
[name](url): summary lines). llms-full.txt MUST be generated by a
script that reads the site nav configuration and concatenates the
source markdown of every page in nav order, so the generated file
mirrors the sidebar exactly.
Positive Consequences¶
- LLM agents that respect the convention MAY discover and ground recommendations cleanly.
- The hierarchical index doubles as a stable machine-readable table of contents for external linkers.
- The concatenated file is a single fetch target that survives across agent contexts that cannot crawl multiple pages.
Negative Consequences¶
- The build pipeline MUST regenerate
llms-full.txton every chapter change; a stale file WILL silently mislead agents. - The convention is still emerging; some agents MAY not respect
llms.txtdiscovery and WILL fall back to crawling.
Consequences¶
After this decision, llms.txt and llms-full.txt MUST exist at the
repository root and MUST be published at the site root. llms.txt MUST
validate against the format proposed at https://llmstxt.org/ with at
minimum: an H1 title; a > description block; H2 sections covering
the chapter set and the ADR set; and [name](url): summary lines for
every chapter and every ADR. llms-full.txt MUST be generated by a
script that reads mkdocs.yml's nav in order and concatenates the
source markdown of every listed page. The CI workflow MUST regenerate
llms-full.txt as a post-build step and MUST fail the build if the
regenerated file differs from the committed one.
Pros and Cons of the Options¶
Ship both llms.txt and llms-full.txt¶
A hierarchical index plus a concatenated full-text file.
- Good, because it serves both LLM access patterns (index-then-fetch and grab-the-whole-guide).
- Good, because the index doubles as a stable machine-readable table of contents.
- Good, because the cost is small relative to the readership served.
- Bad, because the concatenated file MUST be regenerated on every chapter change.
Ship only llms.txt¶
A hierarchical index without a concatenated full-text file.
- Good, because the index is cheap to maintain.
- Bad, because agents that want the whole guide MUST crawl per page, which fails in agent contexts that cannot or will not crawl.
Ship only llms-full.txt¶
A concatenated full-text file without a hierarchical index.
- Good, because the full content is a single fetch.
- Bad, because there is no hierarchical entry point — agents that want a section header or a specific chapter MUST string-scan the concatenation.
Ship neither¶
Let LLM agents crawl the rendered site as if it were any other web documentation.
- Good, because there is no extra file to maintain.
- Bad, because agents that respect
llms.txtskip sites that lack it, and the guide loses a meaningful slice of its LLM-agent audience.
Links¶
- llms.txt proposal: https://llmstxt.org/
- See also ADR 0001 (hybrid structure), ADR 0002 (site generator).