The Poorly Lit Factory

What a tent in a Tesla parking lot tells us about the future of AI-generated code

Mar 19, 2026

gray vehicle being fixed inside factory using robot machines — Photo by Lenny Kuhne on Unsplash

At the foot of Mount Fuji, robots build other robots in total darkness. No lights — the machines don’t need to see. No heating, no air conditioning either. “Not only is it lights-out,” a FANUC executive once explained, “we turn off the air conditioning and heat too.” The yellow arms move with precision through pitch black, producing roughly fifty new robots every twenty-four hours, sometimes running unsupervised for weeks at a stretch.

The industry calls this a “dark factory” — not metaphorically. It’s literally dark because humans aren’t there, and robots don’t care.

In January 2026, entrepreneur Dan Shapiro published a framework mapping AI-assisted software development onto five levels — borrowing the concept directly from FANUC’s lightless factory floor. At one end, AI is “spicy autocomplete.” At the other, it’s a dark factory: specs go in, working software comes out, no human writes or reviews a line of code.

At least one team claims to already be there.

Level Five

What does a dark factory look like when the factory floor is an IDE?

Right now, most teams using AI to write software still keep the lights on. A developer prompts an AI to generate a function, reads what comes back, adjusts the prompt, tries again. Or an AI agent writes a pull request and a human reviews it — scanning the diff, checking the logic, clicking approve. The AI does the typing, but a human is always in the room, either steering or inspecting. In Shapiro’s framework, this is Level 2 or 3. The lights are dimmed, maybe, but someone is always watching.

Last July, StrongDM’s CTO Justin McCarthy formed a three-person team and turned the lights off. Their charter had two rules. The first: code must not be written by humans. The second: code must not be reviewed by humans. Not “humans can review if they want to.” Must not. The rules aren’t constraints born from laziness — they’re a forcing function. If no human is allowed to touch the code, you have to solve every quality problem some other way.

By the time Simon Willison visited their lab in October, three people had produced thirty-two thousand lines of production software without a single line written or reviewed by a human hand. Willison watched a demo where a developer had a complex application running on localhost — something that looked like Google Sheets but with real backend logic, a working frontend, actual functionality. The entire thing had been produced by AI agents operating against human-defined specifications. Willison called it “the most ambitious form of AI-assisted software development I’ve seen yet.”

But the ambition raises an obvious question. If no human writes the code and no human reviews it, how do you know it works?

All Models Are Wrong

McCarthy’s team answered that question by building a world.

Their software talks to Slack, Jira, Okta, Google Docs in production — so they built replicas of all of them. Not mocks that return canned responses, but stateful behavioral clones. Delete a Slack channel in the replica, and it stays deleted; the next time your code tries to post there, it gets the same “channel not found” error it would get from the real thing. Change a user’s role in the Okta replica, and five minutes later the Jira replica blocks their access, just like the real services would.

The AI agents write code, deploy it into this digital universe, and see what happens. They run thousands of scenarios — end-to-end user stories that the team keeps hidden from the coding agents, like holdout sets in machine learning. The agents don’t know what the tests are. They have to bumble through the simulated world, interact with the replicas, and iterate until the software actually works against conditions it wasn’t specifically told to expect. When something breaks, the agents adjust and try again. The feedback comes from the universe itself.

Willison saw why this mattered. Building a high-fidelity clone of a complex SaaS application, he wrote, “was always possible, but never economically feasible” — until LLMs made it cheap enough to build at scale. But strip away the branding and what McCarthy’s team has built is integration testing. Thorough, high-fidelity, running at a scale that wasn’t economical before — but integration testing all the same.

And integration testing has a problem that no amount of scale solves. The tests are only as good as the model they run against. The statistician George Box put it simply in 1987: “All models are wrong, but some are useful.” McCarthy’s digital universe is a model — a good one, good enough to ship thirty-two thousand lines of production code on. But the map is not the territory. When the real Slack changes how it handles rate limits, or Okta tweaks an authentication flow, the model doesn’t know until something updates it. And in that gap between the model and reality, the dark factory is running full speed, producing code that works perfectly in a world that no longer quite exists.

The dream of full automation has crashed into the gap between model and reality before — and the most famous case left a billionaire building cars by hand in a tent.

The Tent

In 2016, Elon Musk described his vision for the Tesla Model 3 production line as “the machine that builds the machine.” The internal codename was the Alien Dreadnought — a factory so automated it would look inhuman. Robots would handle everything. The line would move so fast that air resistance on the parts would become a design constraint. Humans would only slow it down.

By 2018, the Alien Dreadnought was in ruins. Musk had automated tasks that didn’t need automating — complex conveyor systems for jobs a person could do in seconds, robotic arms trying to handle parts that varied just enough to jam the line. The Model 3 was supposed to roll off the line at five thousand cars a week. It was producing a fraction of that, and the company was burning through cash at an unsustainable rate.

So Musk did something that would have been unthinkable two years earlier. In three weeks, his team erected a massive temporary structure — a tent, officially called GA4 — in the Fremont factory parking lot. Inside, humans assembled cars by hand. Not robots. People, working around the clock, doing the thing the perfect automated factory couldn’t do.

“Yes, excessive automation at Tesla was a mistake,” Musk tweeted on April 13, 2018. “To be precise, my mistake. Humans are underrated.”

It’s the kind of story that gets told as a cautionary tale — the hubris of full automation, the return of the human. But that’s not where the story ends. Tesla didn’t abandon the dream. They rebuilt the line incrementally, automating where it made sense, keeping humans where it didn’t. The Fremont factory today produces roughly half a million vehicles a year, far more automated than the tent but far less than the Alien Dreadnought imagined. Musk didn’t skip to lights-out in one leap. He got there — or closer to there — by working his way along the curve.

The tent was a point on the curve.

The Poorly Lit Factory

Everybody’s building toward the dark factory now. Cursor says thirty-five percent of its own merged pull requests come from autonomous agents; Cognition claims their AI developer Devin merges sixty-seven percent of its PRs. But companies selling AI tools have an interest in claiming the factory is going dark — and independent testing puts Devin’s success rate on open-ended tasks closer to fifteen percent. That gap between vendor stats and outside measurement tells you something about where we actually are. Then again, companies with no AI product to sell are seeing real results too: Stripe’s internal system produces over a thousand agent-written PRs a week — though every one of them is still reviewed by a human before it merges.

Nobody’s fully dark yet. Most teams are somewhere between Shapiro’s Level 2 and Level 4 — the lights dimmed but not off, humans still in the loop at some point in the chain. McCarthy’s team at StrongDM may be the furthest along, and even they depend on a model of the world that’s only as current as its last update.

Maybe that’s fine. Maybe the dark factory was always an asymptote — a limit we approach but never quite reach. FANUC still has humans maintaining the lines at Mount Fuji. Tesla rebuilt toward automation incrementally after the tent, and Fremont today is neither the Alien Dreadnought nor the parking lot.

The thing that matters is a factory that works. A dark factory that fails when the parts come in slightly different than the simulation predicted isn’t useful — no matter how impressive the automation looks in a demo. And if keeping a few lights on is what makes it run, then a poorly lit factory is the better factory.

There will always be people trying to turn that last light off. That’s fine — that’s how the asymptote moves. But the work worth watching is what’s getting built while the lights are still on.

BoxCars AI

Discussion about this post

Ready for more?