/ llmtxt.info

llms.txt vs robots.txt, sitemap.xml, and llms-full.txt

Four files, four jobs. Here is exactly what each one does — and how to use them together.

Last updated:

TL;DR

They are not interchangeable. robots.txt tells crawlers what they may or may not access. sitemap.xml tells search engines what exists. llms.txt tells AI assistants what is worth reading. llms-full.txt hands them the actual content.

Side-by-side matrix

Quick reference. See sections below for the nuances.
Criterionrobots.txtsitemap.xmlllms.txtllms-full.txt
Primary purposeAccess control for crawlersDiscovery of pages for search enginesCurated map for LLM clientsInline corpus for LLM ingestion
AudienceWeb crawlers (Googlebot, Bingbot, GPTBot…)Search enginesLLM clients and assistantsLLM clients needing full content
FormatPlain text, custom REP grammarXMLMarkdownMarkdown (concatenated)
Standard?Yes — IETF RFC 9309 (2022)Yes — sitemaps.orgCommunity proposal — llmstxt.orgCommunity proposal — llmstxt.org
Required?No, but recommendedNo, but recommended for large sitesNoNo
Controls indexing?Yes (allow / disallow)No (just discovery hint)NoNo
ApproachExclusionDiscovery (be exhaustive)Curation (be selective)Inlining (provide full text)
File path/robots.txt/sitemap.xml (or any URL declared in robots.txt)/llms.txt/llms-full.txt

llms.txt vs robots.txt

robots.txt is an access-control file standardized as IETF RFC 9309. It uses the Robots Exclusion Protocol grammar (User-agent, Disallow, Allow, Sitemap) to tell crawlers which paths they are allowed to fetch.

llms.txt is the opposite intent: a positive recommendation list. It does not block anyone, it does not grant access, and it has no effect on whether a crawler fetches anything else on your site. It just says: if you are an LLM client, here is the high-quality subset.

Practical implication: continue to use robots.txt for what it does well (blocking expensive bots, declaring your sitemap location). Add llms.txt as a complement, not a replacement.

llms.txt vs sitemap.xml

sitemap.xml aims for completeness: it lists every URL you want a search engine to know about, plus metadata (lastmod, priority, alternate languages). It is XML, machine-only, often auto-generated.

llms.txt aims for curation: a small Markdown list of the pages an LLM should read first. It does not replace your sitemap. It rarely includes more than a few dozen URLs, while a sitemap on a content-heavy site can list hundreds of thousands.

Think of sitemap.xml as a directory and llms.txt as a recommended-reading shelf curated by a librarian.

llms.txt vs llms-full.txt

Same family, different role:

  • llms.txt is the map: a list of titled links.
  • llms-full.txt is the territory: the actual content of those (and other) pages, concatenated as Markdown into one file.

The llms-full.txt convention was popularized by Mintlify in collaboration with Anthropic. It lets a developer paste a single URL into an AI chat and load an entire documentation corpus as context. Most large documentation platforms publish both files side-by-side.

Rule of thumb: publish llms.txt always; add llms-full.txt if your content is primarily textual and benefits from being loaded in bulk.

llms.txt vs schema.org / JSON-LD

Schema.org is a vocabulary for marking up the meaning of individual pages in JSON-LD or microdata. Search engines and assistants use it to extract structured facts: a product’s price, a recipe’s ingredients, a FAQ’s questions and answers.

llms.txt operates one level above: it’s a site-wide map, not a per-page enrichment. The two are complementary. Schema.org tells an LLM what a page is; llms.txt tells it which pages to look at first.

How to combine them

  1. Publish both robots.txt and sitemap.xml as you already do for SEO.
  2. Add llms.txt at the root for AI clients.
  3. Optionally add llms-full.txt if your site is documentation- or knowledge-heavy.
  4. In robots.txt, leave /llms.txt and /llms-full.txt accessible (do not Disallow them).
  5. Keep schema.org markup on individual pages where it makes sense (Product, FAQ, Article…).

Next

Sources