Building for Both: How to Make Your Website Work for Humans and AI Agents

The story of llms.txt and how it's helping AI agents navigate the human web.

Dec 12, 2024

photo of girl laying left hand on white digital robot — Photo by Andy Kelly on Unsplash

"When it comes to websites, what is good for human readers but bad for LLMs?", asked Ben.

The question hung in the air as I set down my fork, the Austin skyline stretching out beyond the windows. I was having lunch with Ben, an engineering leader at a major tech company and a friend who reads this newsletter, at his company's cafeteria. His team was knee-deep in an initiative to expand their API ecosystem, allowing more partners to integrate with their product.

Ben was looking ahead. They had been pouring resources into revamping their APIs, crafting beautiful documentation with rich examples, interactive widgets, and carefully designed navigation. All of it meticulously created for human developers. But now, he was starting to wonder if they were optimizing for the wrong audience.

With AI coding tools becoming increasingly sophisticated, he could see a shift coming. In a year or two, the primary consumers of his team's APIs might not be humans carefully reading documentation, but AI agents rapidly synthesizing how to use them. All those carefully crafted UI elements and interactive examples—the very things that make documentation accessible to humans—might become obstacles for his new customers - AI coding agents.

Looking down at the city below, I thought about how the web has always been a tale of two audiences. Ben's challenge is just the latest chapter in a story that's been unfolding since the early days of the internet. We've been here before, haven't we? In 1994, when web crawlers first started roaming the internet, we created robots.txt - a simple way to tell search engines which parts of our sites they could access. Now, thirty years later, we're facing a similar challenge with AI agents trying to understand our websites. But this time feels different, because these machines aren't just reading - they're doing a lot more.

The Web's Dual Nature

The web has always worn two faces: one designed for human eyes, another crafted for machine understanding. While humans enjoyed increasingly sophisticated interfaces, behind the scenes we were building a parallel infrastructure for machine consumption.

This duality has shaped every major evolution of the web. When search engines emerged, we didn't just craft beautiful homepage layouts - we embedded meta tags and structured data to help crawlers understand our content. As social platforms grew, we added OpenGraph tags so machines could generate rich previews. When APIs became central to modern development, we maintained human-readable documentation alongside machine-readable OpenAPI specifications.

Each wave reinforced this split personality: build something beautiful for humans, then create a clean, structured version for machines.

Consider these parallel tracks we've built:

Meta tags tell search engines what our pages are about
OpenGraph tags help social platforms create preview cards
Schema.org markup helps search engines understand content structure
Favicon files give browsers the right icons to display
RSS feeds provide clean, structured content for readers
API specifications help developers integrate services

We've gotten good at this dance of duality. But all these machine-readable formats shared one trait: they were designed for machines that only needed to read and remember. The machines weren't trying to actually use our websites - they were just trying to categorize them - play a supporting role with the human in charge.

AI Agents: A New Face at the Table

Now, with the rise of AI agents that need to actively use our digital services, we're facing a new challenge: creating a web that allows our digital assistants to navigate and take action on our behalf. This isn't just about reading content anymore - it's about understanding and using it.

The answer to our opening question - "What is good for human readers but bad for LLMs?" - lies in understanding how differently these customers consume our content. The very elements we've crafted to make websites engaging for humans - interactive widgets, fancy layouts, progressive disclosure of information, and visual hierarchies - become obstacles for AI agents. This isn't about good or bad design - it's about recognizing that AI agents and humans have fundamentally different needs when reading the web. Let's look at those differences:

Different Constraints

AI agents process information through context windows - limited spaces where they can hold and analyze text. While a human might benefit from information being broken into digestible chunks, an AI processes everything within its context window at once. Extra HTML markup, scripts, and styling don't just add noise - they actively consume this precious space. Those beautifully designed UI elements that help humans navigate complex documentation become costly overhead for AI agents.

Different Strengths

Yet these AI customers also bring unique capabilities. They don't get fatigued reading dense technical documentation. They can process and synthesize information from multiple sources simultaneously. They don't need interactive examples to understand an API - they can parse and understand complex specifications directly. What might overwhelm a human reader is perfect for an AI agent.

Today, when an LLM "reads" a web page, it receives an HTML file filled with layout information, scripts, and styling - all the elements that make websites accessible to humans but create unnecessary complexity for machines. What we need is a way to serve both audiences effectively, each according to their strengths.

This is where the llms.txt proposal comes in. Introduced in early 2024 by Jeremy Howard of fast.ai and Answer.AI, it suggests a standardized way for websites to provide AI-friendly versions of their content. The idea is simple: websites can provide a markdown version of any page by simply appending .md to the URL. Think of it like the printer-friendly versions of web pages we used to create, but for AI consumption. A page at docs.example.com/api/overview.html would have its AI-friendly version at docs.example.com/api/overview.html.md.

To understand why this matters, let's return to Ben's API documentation challenge. His team's beautifully crafted documentation might look like this in HTML:

<h1>Welcome to My Website</h1>
<p>This is a <strong>paragraph</strong> with some <em>formatted</em> text.</p>
<ul>
  <li>Item 1</li>
  <li>Item 2</li>
</ul>

When an AI coding assistant tries to read this, it has to wade through HTML tags and styling information that consume precious context window space. The llms.txt proposal suggests providing a clean markdown version instead:

# Welcome to My Website

This is a **paragraph** with some *formatted* text.

- Item 1
- Item 2

The markdown version isn't just shorter - it's clearer in its intent. There's no confusion between content and presentation. This is why services like Firecrawl and other web scrapers are increasingly converting HTML content to markdown for use with LLMs.

But the proposal goes further. Just as robots.txt helps crawlers understand where to look, llms.txt helps AI agents understand what to read. Located at the root of a website (e.g., example.com/llms.txt), this markdown file serves as a curated guide for AI assistants. It can point to the most relevant documentation, explain key concepts, and help AI tools understand the context they need to assist users effectively.

This brings us back to Ben's question. While he could build a highly interactive website with a great user experience, if he wants to increase usage of his API, he needs to make it easier for people to build with tools like Cursor and other AI-powered development assistants. By implementing llms.txt and providing markdown versions of his documentation, he could serve both audiences effectively: rich, interactive docs for human developers, and clean, structured content for AI assistants.

A Growing Standard

The proposal gained significant momentum when Mintlify added support on November 14th, instantly making thousands of dev tools' documentation LLM-friendly. While nascent, it's already seeing adoption from major AI companies and tech platforms. Anthropic, with their Claude AI assistant, maintains one of the largest llms.txt files, including comprehensive multi-language support.

A typical llms.txt file might look like this:

# FastHTML

> FastHTML is a python library which brings together Starlette, Uvicorn, HTMX, and fastcore's `FT` "FastTags" into a library for creating server-rendered hypermedia applications.

Important notes:

- Although parts of its API are inspired by FastAPI, it is *not* compatible with FastAPI syntax and is not targeted at creating API services
- FastHTML is compatible with JS-native web components and any vanilla JS library, but not with React, Vue, or Svelte.

## Docs

- [FastHTML quick start](https://docs.fastht.ml/path/quickstart.html.md): A brief overview of many FastHTML features
- [HTMX reference](https://raw.githubusercontent.com/path/reference.md): Brief description of all HTMX attributes, CSS classes, headers, events, extensions, js lib methods, and config options

## Examples

- [Todo list application](https://raw.githubusercontent.com/path/adv_app.py): Detailed walk-thru of a complete CRUD app in FastHTML showing idiomatic use of FastHTML and HTMX patterns.

## Optional

- [Starlette full documentation](https://gist.githubusercontent.com/path/starlette-sml.md): A subset of the Starlette documentation useful for FastHTML development.

The "Optional" section is particularly clever - it helps AI agents manage their limited context windows. Content above this section is considered essential and should always be processed, while content below can be skipped if the agent is running low on context space. This prioritization ensures that even with tight context constraints, AI agents can still grasp the most important aspects of a service.

The beauty of this approach lies in its simplicity and compatibility with existing web architecture. Just as robots.txt helped crawlers navigate the early web, llms.txt provides a clean, backwards-compatible way for AI agents to better understand web content. If a site provides an llms.txt file, AI agents can work more efficiently. If not, they'll continue processing HTML content as they do today.

This organic, decentralized approach is already spreading. Community tools are emerging rapidly: directory services like llmstxt.cloud index LLM-friendly documentation, while open-source generators from companies like Firecrawl help websites create their own llms.txt files. With tools like WordPress plugins on the horizon, this standard could potentially extend to billions of websites, making the web more accessible to our new AI collaborators while preserving rich experiences for human users.

There's even an innovative approach called "Roaming RAG" that uses llms.txt files like an index in a book to help AI agents find information. Unlike traditional RAG systems that require complex infrastructure - vector databases, chunking pipelines, and embedding models - Roaming RAG lets AI agents navigate documentation naturally, much like how humans use a table of contents. The AI simply browses through the document hierarchy, expanding relevant sections as needed. This organic approach not only eliminates the need for complex retrieval infrastructure but also preserves the contextual relationships between different parts of the documentation, helping AI agents build more informed responses.

The Next Chapter

But we shouldn't assume llms.txt is the final word in this story. AI models are growing more sophisticated by the day. Anthropic's Claude, for instance, can now directly use a web browser like a human would, navigating interfaces and interpreting visual elements without needing special formats. Through their new "Computer Use" feature in beta, Claude can perceive and interact with computer interfaces - moving cursors, clicking buttons, and typing text just as a human would. Companies like Scrapybara, backed by Y Combinator, are building infrastructure to provision remote desktops for AI agents, letting them interact with the web just as we do.

Looking back at that lunch conversation with Ben, one thing becomes clear: we're no longer building the web just for humans. Whether through specialized formats like llms.txt or through increasingly human-like browsing capabilities, AI agents are becoming active participants in our digital world. Their ability to help us - to augment human capabilities - will depend on how we adapt our digital spaces to accommodate these new inhabitants of the web.

BoxCars AI

Discussion about this post