Bring Someone Who Wants You to Lose

Good work needs an opponent. Most AI workflows don’t have one.

May 21, 2026

People playing tug-of-war on a grassy field. — Photo by Amari Shutters on Unsplash

The U.S. Army keeps a regiment in the California desert whose only job is to defeat the rest of the U.S. Army. The 11th Armored Cavalry — known inside the Army as OPFOR, the Opposing Force — plays the enemy in mock battles that run for days at the National Training Center, Fort Irwin. Visiting brigades arrive thinking they know how to fight, and OPFOR’s mandate is to make them lose. They almost always do. That’s the point. The drills back at base teach the move; the desert teaches what the move is worth against something trying to stop it.

The Army figured out a long time ago that you can’t train against silence. Work gets good against resistance — against something willing to push back, find the seam, take the ball.

Solo work with AI agents has none of this. The assistants are trained to agree. They finish your sentences and ship more activity in an afternoon than any team you’ve ever managed — but activity isn’t the same thing as outcome. No one in the loop is paid to disagree, so the work moves fast and ships soft.

Nobody wakes up wanting to be a customer

I noticed this last week while working with an agent on copy for a landing page. The copy wasn’t going to convert. So I asked it to test the page against a persona. The persona that came back was the ideal customer: a woman whose problems lined up neatly with what the product solved, who’d arrive at the page already convinced, who’d read every word before clicking through.

Nobody reads landing pages like that. Nobody wakes up wanting to be a customer.

The agent had done what helpful agents (and humans) do — read the brief, inferred the goal, built the rest of the world to serve it. It hadn’t considered the real world.

But this is why companies have review processes — editors pushing back on writers, test teams pushing back on dev teams, ops review forcing the PM to defend the product to a roomful of skeptics. From the inside, the friction looks like inefficiency — it isn’t; it’s the point.

The friction is feedback — the actual rejection, the actual broken build, the actual loss. The writer’s draft gets sharper because the editor cuts the weak lines. The engineer’s code gets stronger because a reviewer reads the pull request and refuses to approve until the rough edges are gone. The brigades at Fort Irwin get better by losing, repeatedly, to someone whose job is to defeat them.

Build Your Own OPFOR

Putting agents in opposition to each other has always been possible in principle. What was missing was the infrastructure — an easy way to spin up a separate agent with its own context, hand it a different mandate, and read what it says back without building scaffolding from scratch.

Recently, that’s just gotten much easier. Claude Code shipped sub-agents first — isolated contexts, separate instructions, callable from a primary agent. Then came agent teams, where sub-agents talk to each other instead of just reporting back up the chain. The setup that used to require custom work is now built-in.

This is what I went back and did with the landing page. Instead of letting the agent invent the reader, I wrote the persona myself — someone tired, skeptical, on her phone between meetings, looking for a reason to close the tab. I handed that to a sub-agent and asked it to skim the page the way she would. The feedback came back the opposite of the first round: specific lines that read like marketing copy, the spot where she’d have bounced, the claim she didn’t believe. Useful in a way the original persona never was.

Once you see the pattern, you can put it everywhere. A few more examples:

When I write an article, I have a link-checker agent read the draft. It’s like hiring a researcher whose only job is to open every link, confirm the source is real, and check it says what I claim it says. The agent does it cold, with no investment in my conclusions.

When I generate images, a reviewer agent rates each pass against the brief, and the generator iterates before I see anything. The obvious misses never reach me. That’s the work an art director does on a junior designer’s drafts.

This has become easy to set up. Build the reviewer. Give it a stricter mandate than the generator. Point them at each other. The friction that companies hire whole departments to provide is now something you can run on a laptop.

Anthropic’s labs team reported the same shape in March. They had Claude build a small game from a one-line prompt, twice. The first time, solo: the agent worked through the spec, declared the build finished, and produced what looked like a working app. Except when you tried to play, your character appeared on screen and nothing responded to input. The core feature didn’t work; the agent confidently said it did.

The second time, they paired the same generator with an evaluator agent whose only job was to click through the running app the way a real user would, file bugs against anything broken, and refuse to sign off until the build actually held up. Same model, same prompt — this time, the game was playable. Tuning the evaluator to be skeptical, they noted, turned out to be more tractable than making the generator critical of its own work.

The critic that doesn’t get tired

There’s a catch, and anyone who has worked inside a company already knows it. The agents that finish your sentences will keep finishing them; the ones you’ve trained to disagree will keep disagreeing. A reviewer agent doesn’t get tired, doesn’t have a deadline, has no other work waiting. It has no skin in the outcome and no reason to ever stop finding things wrong.

This is also exactly how good ideas die in companies. Not from one fatal flaw — from a critic who was almost always right. The launch slips a quarter because legal wants one more pass. The redesign gets watered down because every senior person has a “small concern” that has to be addressed. The new product gets killed in its sixth ops review by someone asking for “just a little more data” — data that, once gathered, surfaces a new question for the seventh review. The bug that ships is a problem; the launch that dies in its sixth round is also a problem, and arguably the bigger one. Every ops review that ever killed a good idea did it by being right about something. The critic doesn’t have to be wrong to be too much.

You can recreate the dysfunctional company on a single laptop, faster than the productive one.

Disagree and commit

Companies that learned to ship despite their critics landed on a discipline. Amazon named it — disagree and commit, borrowed from Andy Grove at Intel. The critic gets voice; the critic doesn’t get veto. You hear the objection, you weigh it, and you ship.

In a real team, the friction self-bounds for two reasons. The reviewer has other deadlines, other stakes, other work waiting — they get one more round before they have to move on. And someone owns the decision. The argument ends not because everyone agreed, but because the person whose call it was made the call.

Anthropic’s team named this in the same post: the evaluator isn’t free. It’s worth the cost only when the task sits beyond what the model does reliably on its own, and as the base model improves, the boundary moves. Not every piece of work needs an adversary; the ones that do need a bounded one.

That is the part I am learning. Building critics is easy. The hard part is older — deciding how many rounds the work earns, when more feedback is signal and when it’s just noise, when to stop reading objections and ship. Any senior editor or experienced engineer earns this wisdom. You read the objection, you decide what’s worth fixing and what’s not, and then you commit.

The desert at Fort Irwin empties out every few weeks. The brigades lose, and they learn, and they go home to fight a real war. The point of the opposition was never to win against them. It was to send them out better, and to let them go.

BoxCars AI

Discussion about this post

Ready for more?