llms.txt vs robots.txt, sitemap.xml, and llms-full.txt
Four files, four jobs. Here is exactly what each one does — and how to use them together.
Last updated:
TL;DR
They are not interchangeable.
robots.txt tells crawlers what they may or may not access.
sitemap.xml tells search engines what exists.
llms.txt tells AI assistants what is worth reading.
llms-full.txt hands them the actual content.
Side-by-side matrix
| Criterion | robots.txt | sitemap.xml | llms.txt | llms-full.txt |
|---|---|---|---|---|
| Primary purpose | Access control for crawlers | Discovery of pages for search engines | Curated map for LLM clients | Inline corpus for LLM ingestion |
| Audience | Web crawlers (Googlebot, Bingbot, GPTBot…) | Search engines | LLM clients and assistants | LLM clients needing full content |
| Format | Plain text, custom REP grammar | XML | Markdown | Markdown (concatenated) |
| Standard? | Yes — IETF RFC 9309 (2022) | Yes — sitemaps.org | Community proposal — llmstxt.org | Community proposal — llmstxt.org |
| Required? | No, but recommended | No, but recommended for large sites | No | No |
| Controls indexing? | Yes (allow / disallow) | No (just discovery hint) | No | No |
| Approach | Exclusion | Discovery (be exhaustive) | Curation (be selective) | Inlining (provide full text) |
| File path | /robots.txt | /sitemap.xml (or any URL declared in robots.txt) | /llms.txt | /llms-full.txt |
llms.txt vs robots.txt
robots.txt is an access-control file
standardized as IETF RFC 9309.
It uses the Robots Exclusion Protocol grammar (User-agent,
Disallow, Allow, Sitemap) to tell
crawlers which paths they are allowed to fetch.
llms.txt is the opposite intent: a positive
recommendation list. It does not block anyone, it does not grant access,
and it has no effect on whether a crawler fetches anything else on your
site. It just says: if you are an LLM client, here is the high-quality
subset.
Practical implication: continue to use robots.txt for what it
does well (blocking expensive bots, declaring your sitemap location).
Add llms.txt as a complement, not a replacement.
llms.txt vs sitemap.xml
sitemap.xml aims for completeness: it lists
every URL you want a search engine to know about, plus metadata
(lastmod, priority, alternate languages). It is
XML, machine-only, often auto-generated.
llms.txt aims for curation: a small Markdown
list of the pages an LLM should read first. It does not replace your
sitemap. It rarely includes more than a few dozen URLs, while a sitemap
on a content-heavy site can list hundreds of thousands.
Think of sitemap.xml as a directory and llms.txt
as a recommended-reading shelf curated by a librarian.
llms.txt vs llms-full.txt
Same family, different role:
llms.txtis the map: a list of titled links.llms-full.txtis the territory: the actual content of those (and other) pages, concatenated as Markdown into one file.
The llms-full.txt convention was popularized by Mintlify in
collaboration with Anthropic. It lets a developer paste a single URL into
an AI chat and load an entire documentation corpus as context. Most large
documentation platforms publish both files side-by-side.
Rule of thumb: publish llms.txt always; add
llms-full.txt if your content is primarily textual and
benefits from being loaded in bulk.
llms.txt vs schema.org / JSON-LD
Schema.org is a vocabulary for marking up the meaning of individual pages in JSON-LD or microdata. Search engines and assistants use it to extract structured facts: a product’s price, a recipe’s ingredients, a FAQ’s questions and answers.
llms.txt operates one level above: it’s a
site-wide map, not a per-page enrichment. The two are
complementary. Schema.org tells an LLM what a page is;
llms.txt tells it which pages to look at first.
How to combine them
- Publish both
robots.txtandsitemap.xmlas you already do for SEO. - Add
llms.txtat the root for AI clients. - Optionally add
llms-full.txtif your site is documentation- or knowledge-heavy. - In
robots.txt, leave/llms.txtand/llms-full.txtaccessible (do notDisallowthem). - Keep schema.org markup on individual pages where it makes sense (Product, FAQ, Article…).
Next
- How llms.txt works — the spec in detail.
- Best practices.
- FAQ.