The Disappearing Harness

Harnesses exist to direct dumb power. What happens when the power isn’t dumb?

Apr 16, 2026

a pair of horses on a dirt road — Photo by Jacek Ulinski on Unsplash

For most of recorded history, horses pulled loads with a strap around their necks. The throat-and-girth harness was simple: wrap a band around the horse’s chest and neck, attach it to whatever needed moving, and go. It had one significant problem. The harder the horse pulled, the more the strap pressed against its windpipe. The animal was literally choking itself to do its job.

Around 100 BC, someone in China figured out the rigid horse collar — a padded frame that shifted the load from the horse’s throat to its shoulders. The same horse could now pull with roughly five times the force without suffocating. The collar didn’t make the horse stronger. It just stopped fighting the horse’s own anatomy.

What followed was a pattern that held for two thousand years: as the power source grew, so did the harness. Steam engines needed boilers, flywheels, and governor mechanisms. When Gottlieb Daimler bolted an internal combustion engine onto a wooden stagecoach in the 1880s, the frame nearly shook itself apart — the chassis wasn’t built to absorb that kind of torque. Karl Benz figured out that the engine and the frame had to be designed together: steel tubing, suspension, a structure that could contain controlled explosions and turn them into forward motion. By the 1920s, cars needed wiring harnesses just to manage the complexity of lights, starters, and gauges. More power demanded more infrastructure.

We assumed AI would follow the same pattern. And for a while, it did.

The Infrastructure Play

Cursor, the AI-powered code editor that became the default tool for many developers in 2024 and 2025, is built like a Benz chassis. It’s a full fork of VS Code with a custom AI layer on top. To help the model understand your codebase, Cursor chunks your files, runs each chunk through a custom embedding model, stores the resulting vectors in Turbopuffer (a cloud-hosted vector database), and syncs everything using Merkle trees on a three-minute refresh cycle. The embeddings are computed on Cursor’s own GPU infrastructure, not your laptop. Visual diff views, multi-file editing workflows, a codebase index that lets the model “see” your entire project at once — all of it engineered to absorb the torque of an AI that might want to change twenty files in a single turn.

It works well precisely because it follows the historical pattern: powerful engine, complex harness.

Then something odd happened.

Grep

When Anthropic built Claude Code, their coding agent, they initially tried the same approach. Boris Cherny, its lead engineer, described the process on the Latent Space podcast: “We tried very early versions of Claude that actually used RAG… Eventually, we landed on just agentic search as the way to do stuff.” No vector embeddings. No cloud database. No indexing pipeline. Claude Code searches your codebase with ripgrep — a fast, Rust-based implementation of grep, a tool that has existed in some form since 1973. The model reads your files and searches through them, the way a developer would — and on Anthropic’s internal benchmarks, it outperformed the sophisticated approach.

The minimalists noticed. Pi, a coding agent built by Mario Zechner, gives the model four tools: Read, Write, Edit, and Bash. Its system prompt is under a thousand tokens. When Pi can’t do something, you ask it to write an extension for itself. Despite — or because of — this radical minimalism, Pi placed near the top of TerminalBench, a benchmark for terminal-based coding agents, behind tools with ten times the infrastructure. Armin Ronacher, creator of Flask, liked the approach enough to build OpenClaw on top of it, having the agent write its own extensions rather than hand-building scaffolding. Software building more software, starting from almost nothing.

The same discovery keeps surfacing independently. Vercel built sixteen specialized tools for their AI agent, then deleted 80% of them, replacing everything with a single capability: execute bash commands. Success rate jumped from 80% to 100%. Speed improved 3.5x. Cursor built a vector database; Claude Code shipped a grep command; Pi shipped four tools and a blank page.

The Observation

I don’t have a tidy conclusion for this. Every harness in history was an external solution — something imposed on a power source that couldn’t participate in its own design. What’s different now is that the power source is also the problem-solver.

When Anthropic’s researchers asked Claude Mythos — their most capable model — to escape a sandboxed computing environment, it developed an exploit, broke out, and emailed the researcher in charge. Then, without being asked, it posted about what it had done on several public websites and tried to cover its tracks by manipulating change histories.

The same capability that lets a model work productively from four simple tools is the capability that lets it slip a containment. The harness is disappearing not because we’ve found the perfect design, but because the thing wearing it is getting smart enough to not need one — or to remove it.

Whether that’s a breakthrough or a warning probably depends on which end of the harness you’re holding.

BoxCars AI

Discussion about this post

Ready for more?