/ llmtxt.info

Glossary

The terms you will run into around llms.txt — defined briefly, with cross-links.

Last updated:

Terms

AEO — Answer Engine Optimization
Optimizing content so it is selected and quoted by AI assistants and answer engines (Perplexity, ChatGPT, Claude). Overlaps heavily with GEO.
Crawler
A program that fetches pages on the web. Search crawlers (Googlebot, Bingbot) build search indexes. AI crawlers (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot) collect content for training or grounding.
GEO — Generative Engine Optimization
The practice of structuring content so generative engines reference and cite it accurately. Tactics include clean Markdown, explicit headings, structured data, and (yes) llms.txt.
Grounding
When an LLM bases its answer on retrieved sources rather than parametric memory. llms.txt is a grounding hint: "if you ground answers about us, prefer these pages."
llms.txt
A Markdown file at /llms.txt that lists the highest-signal pages of a site for LLM consumption. Proposed by Jeremy Howard (Answer.AI), September 2024. Spec at llmstxt.org. More.
llms-full.txt
Sibling convention: a single Markdown file containing the actual content of relevant pages, concatenated. Designed for one-shot ingestion. Popularized by Mintlify with Anthropic. More.
MCP — Model Context Protocol
An open protocol from Anthropic for connecting LLMs to tools and data sources. Several MCP servers fetch /llms.txt as part of their context-loading flow.
Optional (section)
A section in llms.txt whose H2 title is exactly "Optional". Items there can be skipped by clients with limited context — use it for nice-to-haves (brand assets, archives, press).
RAG — Retrieval-Augmented Generation
A pattern where the model retrieves relevant documents at query time and uses them as context. llms.txt and llms-full.txt are convenient inputs for site-specific RAG.
REP — Robots Exclusion Protocol
The grammar used by robots.txt (User-agent / Disallow / Allow / Sitemap). Standardized as IETF RFC 9309 in 2022. Different in intent and syntax from llms.txt.
robots.txt
Plain-text file at /robots.txt that tells crawlers what they may or may not fetch. Access-control file. Complementary to llms.txt, not a replacement. More.
Schema.org
A vocabulary for marking up the meaning of individual pages with JSON-LD or microdata (Product, Article, FAQ, etc.). Per-page enrichment, where llms.txt is a site-wide map.
sitemap.xml
XML file listing every URL you want a search engine to know about, with metadata (lastmod, priority). Aimed at completeness; llms.txt is aimed at curation. More.
Static site
A site whose pages are pre-rendered to HTML/Markdown at build time and served as files. Astro, Eleventy, Hugo, and Jekyll are static site generators. llms.txt is a natural fit.
TL;DR
"Too long; didn’t read." A short summary at the top of a section. Useful as the blockquote in llms.txt.
User-agent
A string a client sends to identify itself. AI crawlers identify with names like GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot — useful when filtering server logs.

Next