Glossary

The terms you will run into around llms.txt, defined briefly, with cross-links.

Last updated: April 14, 2026

Terms

AEO, Answer Engine Optimization: Optimizing content so it is selected and quoted by AI assistants and answer engines (Perplexity, ChatGPT, Claude). Overlaps heavily with GEO.
Crawler: A program that fetches pages on the web. Search crawlers (Googlebot, Bingbot) build search indexes. AI crawlers (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot) collect content for training or grounding.
GEO, Generative Engine Optimization: The practice of structuring content so generative engines reference and cite it accurately. Tactics include clean Markdown, explicit headings, structured data, and (yes) llms.txt.
Grounding: When an LLM bases its answer on retrieved sources rather than parametric memory. llms.txt is a grounding hint: "if you ground answers about us, prefer these pages."
llms.txt: A Markdown file at /llms.txt that lists the highest-signal pages of a site for LLM consumption. Proposed by Jeremy Howard (Answer.AI), September 2024. Spec at llmstxt.org. More .
llms-full.txt: Sibling convention: a single Markdown file containing the actual content of relevant pages, concatenated. Designed for one-shot ingestion. Popularized by Mintlify with Anthropic. More .
MCP, Model Context Protocol: An open protocol from Anthropic for connecting LLMs to tools and data sources. Several MCP servers fetch /llms.txt as part of their context-loading flow.
Optional (section): A section in llms.txt whose H2 title is exactly "Optional". Items there can be skipped by clients with limited context, use it for nice-to-haves (brand assets, archives, press).
RAG, Retrieval-Augmented Generation: A pattern where the model retrieves relevant documents at query time and uses them as context. llms.txt and llms-full.txt are convenient inputs for site-specific RAG.
REP, Robots Exclusion Protocol: The grammar used by robots.txt (User-agent / Disallow / Allow / Sitemap). Standardized as IETF RFC 9309 in 2022. Different in intent and syntax from llms.txt.
robots.txt: Plain-text file at /robots.txt that tells crawlers what they may or may not fetch. Access-control file. Complementary to llms.txt, not a replacement. More .
Schema.org: A vocabulary for marking up the meaning of individual pages with JSON-LD or microdata (Product, Article, FAQ, etc.). Per-page enrichment, where llms.txt is a site-wide map.
sitemap.xml: XML file listing every URL you want a search engine to know about, with metadata (lastmod, priority). Aimed at completeness; llms.txt is aimed at curation. More .
Static site: A site whose pages are pre-rendered to HTML/Markdown at build time and served as files. Astro, Eleventy, Hugo, and Jekyll are static site generators. llms.txt is a natural fit.
TL;DR: "Too long; didn’t read." A short summary at the top of a section. Useful as the blockquote in llms.txt.
User-agent: A string a client sends to identify itself. AI crawlers identify with names like GPTBot, ClaudeBot, PerplexityBot, Google-Extended, OAI-SearchBot, useful when filtering server logs.

Terms

Next