Best practices
Ten rules, the mistakes we see most often, and concrete patterns for i18n, security, and CI.
Last updated:
Ten rules
- Curate. A short list of high-signal pages beats a long list of mediocre ones. Aim for 10–30 links in the root file.
- Use absolute URLs. Always
https://yourdomain.com/.... Relative URLs are technically allowed but fragile. - Group by product surface. Sections like Product, Pricing, Developers reflect how a user (and an LLM) thinks. Avoid blog/doc/guide buckets unless they map onto your real navigation.
- Keep the summary factual. The blockquote after the H1 should read like a Wikipedia opener, not like a landing-page hero.
- One sentence per item. The colon-prefixed note is for disambiguation, not for marketing.
- Use the
Optionalsection sparingly. It is the right home for press, brand assets, and archives. Do not dump half your sitemap there. - Mirror your stable URLs. If a page in
llms.txtmoves, update or redirect it. Stale URLs poison the file’s reputation. - Publish
llms-full.txtwhen content is text-heavy. Documentation, tutorials, and reference material benefit. Galleries, interactive tools, and primarily visual content do not. - Run the validator in CI. A content migration that breaks your file should fail the build.
- Date your file. A short note like “Last reviewed 2026-04-01” in the body is helpful for both humans and crawlers.
Common mistakes
- No H1. The H1 is the only required element. Without it, the file is invalid.
- Multiple H1s. Use H2s for sections. There must be exactly one H1.
- Custom front-matter. No YAML, no JSON header. The spec is strict; clients will not parse extras.
- Pasted markdown tables / images. Stick to a heading + blockquote + lists. Tables and images add no value to an LLM.
- Including auth-gated URLs. If a page requires login, do not list it — the LLM will hit a wall.
- Overlong descriptions. “The world’s most advanced AI-powered platform for next-gen synergistic transformation” helps no one. Keep notes <15 words.
- Listing 500 URLs. If you need that many, you need
llms-full.txt, per-product variants, or both. - Forgetting to update
robots.txt. Make sure/llms.txtis not blocked.
Multilingual sites
The spec is silent on internationalization. Two patterns work in practice:
- Single English file at the root. The simplest option. Most LLM clients will translate on the fly. Good enough for most sites.
- Per-locale variants. Serve
/llms.txt(default),/fr/llms.txt,/es/llms.txt. Link to them from your root file’s body or under an Optional section.
Whichever pattern you pick, do not duplicate URL sets across locales: each variant should point to the localized version of each page.
Security and privacy
- Everything in
llms.txtis public. Treat the file as broadcast. - Never list staging or preview URLs. They will be picked up by anything that downloads the file.
- Do not list URLs with secrets in query strings. This sounds obvious; we have seen it happen.
- If the page exposes user data behind auth, it does not belong here.
- Audit the file at every release. A leaked draft URL is the most common security mistake.
Automation in CI
Treat llms.txt like any other artifact: generate, validate,
and gate releases on it.
- Generate it from your content source (CMS, MDX collection, database).
- Run the validator in CI; fail the build on any error.
- Diff the file across releases; alert the docs owner on large deletions.
- Smoke-test the production URL after deploy:
curl -fsS https://yourdomain.com/llms.txt | head -1.
Next
- Benefits and limitations — what to expect, what not to.
- Real-world examples — copy what works.
- Validator.