Skip to content

Include llms.txt and llms-full.txt for LLM Discovery

Context

A significant fraction of this guide's intended readership consists of large-language-model agents — Claude Code, Cursor, GitHub Copilot, local code assistants — that consult the guide on behalf of a human engineer. These agents do not behave like a human reader who browses the sidebar, follows links, and accumulates context across multiple chapter visits. They ingest documentation in single passes, prefer small index files for navigation, and prefer one long concatenated file when grounding a complete answer.

The community convention for this access pattern is llms.txt, a proposed format documented at https://llmstxt.org/. The format specifies an H1 title, a > description block, and H2 sections each containing a list of [name](url): summary lines. A complementary llms-full.txt carries the concatenated full text of every page in sidebar order so that an agent that wants the whole guide MAY fetch a single file rather than crawl N pages.

Including both files at the repository root and at the published site root costs little (a small index plus a generated concatenation) and pays back in two concrete ways. First, agents that respect the convention MAY discover the guide cleanly without scraping. Second, the hierarchical llms.txt index doubles as a stable, machine-readable table of contents that survives sidebar reorganisations: an external tool that wants to link to "the Surface chapter" MAY resolve the link through llms.txt rather than hardcoding a sidebar position.

The alternatives considered include omitting the files entirely (let agents crawl), shipping only llms.txt without llms-full.txt (forcing per-page crawls for full grounding), and shipping only llms-full.txt without llms.txt (no hierarchical entry point). Each fails on one of the two payback axes above.

Decision Drivers

  • The guide MUST be ingestible by LLM agents in a single pass without per-page crawling.
  • The guide SHOULD provide a hierarchical, machine-readable table of contents that survives sidebar reorganisations.
  • The discovery layer SHOULD follow an emerging community convention rather than a bespoke format so that agents and tooling MAY adopt it without per-site customisation.
  • The discovery layer MUST be reproducible from the source markdown so that it does not drift from the rendered site.
  • The discovery layer MUST live at the repository root and the published site root so that agents that probe well-known paths MAY find it.

Considered Options

  • Ship both llms.txt (hierarchical index) and llms-full.txt (concatenated full content)
  • Ship only llms.txt (index, no concatenation)
  • Ship only llms-full.txt (concatenation, no index)
  • Ship neither — let LLM agents crawl the rendered site

Decision Outcome

Chosen option: "Ship both llms.txt and llms-full.txt", because it is the only option that serves both discovery patterns: agents that want a hierarchical entry point MAY fetch llms.txt, and agents that want the whole guide for grounding MAY fetch llms-full.txt as a single file. llms.txt MUST follow the format proposed at https://llmstxt.org/ (H1 title, > description, H2 sections, [name](url): summary lines). llms-full.txt MUST be generated by a script that reads the site nav configuration and concatenates the source markdown of every page in nav order, so the generated file mirrors the sidebar exactly.

Positive Consequences

  • LLM agents that respect the convention MAY discover and ground recommendations cleanly.
  • The hierarchical index doubles as a stable machine-readable table of contents for external linkers.
  • The concatenated file is a single fetch target that survives across agent contexts that cannot crawl multiple pages.

Negative Consequences

  • The build pipeline MUST regenerate llms-full.txt on every chapter change; a stale file WILL silently mislead agents.
  • The convention is still emerging; some agents MAY not respect llms.txt discovery and WILL fall back to crawling.

Consequences

After this decision, llms.txt and llms-full.txt MUST exist at the repository root and MUST be published at the site root. llms.txt MUST validate against the format proposed at https://llmstxt.org/ with at minimum: an H1 title; a > description block; H2 sections covering the chapter set and the ADR set; and [name](url): summary lines for every chapter and every ADR. llms-full.txt MUST be generated by a script that reads mkdocs.yml's nav in order and concatenates the source markdown of every listed page. The CI workflow MUST regenerate llms-full.txt as a post-build step and MUST fail the build if the regenerated file differs from the committed one.

Pros and Cons of the Options

Ship both llms.txt and llms-full.txt

A hierarchical index plus a concatenated full-text file.

  • Good, because it serves both LLM access patterns (index-then-fetch and grab-the-whole-guide).
  • Good, because the index doubles as a stable machine-readable table of contents.
  • Good, because the cost is small relative to the readership served.
  • Bad, because the concatenated file MUST be regenerated on every chapter change.

Ship only llms.txt

A hierarchical index without a concatenated full-text file.

  • Good, because the index is cheap to maintain.
  • Bad, because agents that want the whole guide MUST crawl per page, which fails in agent contexts that cannot or will not crawl.

Ship only llms-full.txt

A concatenated full-text file without a hierarchical index.

  • Good, because the full content is a single fetch.
  • Bad, because there is no hierarchical entry point — agents that want a section header or a specific chapter MUST string-scan the concatenation.

Ship neither

Let LLM agents crawl the rendered site as if it were any other web documentation.

  • Good, because there is no extra file to maintain.
  • Bad, because agents that respect llms.txt skip sites that lack it, and the guide loses a meaningful slice of its LLM-agent audience.
  • llms.txt proposal: https://llmstxt.org/
  • See also ADR 0001 (hybrid structure), ADR 0002 (site generator).