Why AI Made Expert Radiologists Worse
The collaboration skill that separates amateurs who win from experts who fail
Tony Stark’s relationship with JARVIS looks effortless. The AI handles targeting, diagnostics, and tactical analysis while Stark supplies creativity and courage. Together, they’re unstoppable. It’s the vision every organization chases when deploying AI: human judgment enhanced by machine precision, creating something better than either alone.
We see this playing out across industries. Apps come with built-in copilots, AI assistants that are ready to serve and help make humans do more. Analysts build forecasts with AI predictions. HR teams screen candidates with AI scoring resumes. The assumption is simple: add an AI assistant, provide the human more information and that leads to better decisions.
But what if simply adding AI doesn’t make us better? Perhaps it even makes us worse?
The MIT Study: Adding AI Can Backfire
Researchers at MIT and Harvard ran an experiment with 180 professional radiologists reviewing 324 chest X-rays. They used CheXpert, an AI model trained on over 200,000 cases that outputs probability scores for various pathologies. The radiologists were randomly assigned to different conditions—some saw only the X-ray, some got AI predictions (copilots), some received clinical history, and some got everything.
For each case, radiologists followed a simple workflow:
First: Look at the chest X-ray
Second: See the AI prediction (if in that group)—something like “Probability of pneumonia = 0.82” (high confidence) or “Probability of pneumonia = 0.47” (uncertain)
Third: Provide their own probability estimate for whether the pathology was present
Fourth: Make a diagnostic recommendation
Fifth: Move to the next case
The results told an unexpected story:
AI alone: More accurate than 2/3 of radiologists
AI + radiologists: No improvement on average
The pattern: AI confidence matters
Confident AI (<20% or >80%): Sometimes helped
Uncertain AI (40-60%): Made radiologists worse
The hesitation showed up in the data: radiologists took 4% longer per case with AI. That might sound trivial—a few extra seconds per diagnosis—but in medicine, hesitation is a red flag. It signals cognitive drag, the mental weight of second-guessing. They weren’t just processing information faster with a helpful tool. They were deliberating, re-evaluating, doubting their initial instinct. The AI’s uncertainty infected their decision-making process, making them slower AND less accurate. The worst of both worlds.
The Traps That Make Us Fail
Two cognitive traps explain the failure.
Trap #1: Correlation Neglect
Both the human and the AI are looking at the same chest X-ray. When they reach similar conclusions, it feels like confirmation—two independent assessments arriving at the same answer. But it’s not independent at all.
Think of it like two witnesses at a crime scene. If they both saw the same event from the same angle, their matching testimony doesn’t double your confidence. They’re working from the same evidence, vulnerable to the same visual tricks, the same misleading details. Real independence would mean one witness saw the crime while the other reviewed security footage, or interviewed the suspect, or examined forensic evidence.
The radiologists and the AI model were both looking at the same chest X-ray. When they agreed, that agreement didn’t provide independent validation—it meant they were both potentially making the same mistake for the same reason. But radiologists treated AI agreement as confirmation anyway. The illusion of independence made them more confident when they should have been more cautious.
Trap #2: Uncertainty Confusion
When the AI showed a probability around 0.47—truly uncertain—radiologists didn’t just note it and move on. They absorbed that uncertainty. “Even the AI isn’t sure, maybe I’m wrong too.” Their own judgment became shakier. They slowed down to deliberate more, and often arrived at worse conclusions than their initial instinct.
The dynamic looks like this. Working alone, a radiologist sees an image, forms a judgment, makes a call, and moves on. Working with AI, they see the image, check the AI prediction, and if it’s uncertain, they start doubting themselves. They re-examine. They deliberate. They take longer and sometimes get it more wrong. The AI’s uncertainty infects their decision-making process.
The MIT researchers suggested a solution: confidence-based routing. Let AI handle cases where it’s confident. Give uncertain cases to humans without showing them the AI’s wavering prediction. Remove the source of infection.
But is that enough? Can we go further? The chess world already answered this question.
Centaur Chess
In 1997, IBM’s Deep Blue defeated world champion Garry Kasparov. The chess world mourned the loss of human supremacy. But Kasparov wondered: what if humans and AI worked together instead of competing? In 1998, he organized the first “Centaur Chess” tournament—teams of humans and AI engines competing against one another.
For years, the pattern held as expected. Top chess players with powerful AI engines dominated. Makes sense: grandmaster expertise plus computational power should win.
Then in 2005, something unexpected happened at a Freestyle Chess tournament.
A team called “Zacks” entered: Steven Crampton and Zackary Stephen—two amateurs—plus three computers. Dark horse. Long odds. But they won, defeating teams with grandmasters and superior engines.
The chess world paid attention. These weren’t master players. They were master collaborators.
Zacks knew when to trust which engine. When to override predictions. How to route decisions based on confidence. When to synthesize multiple perspectives versus when to delegate to the strongest prediction. They had mastered the meta-game of human-AI collaboration.
This is exactly what the MIT researchers suggested: confidence-based routing. Let AI handle cases where it’s confident. Give uncertain cases to humans without showing them the AI’s wavering prediction. But Zacks went further—they developed an intuition for collaboration itself.
The key insight: it’s not about having AI. It’s about knowing how to work with it.
The Real Lesson: Collaboration Is a Skill
The competitive edge isn’t AI access—that’s becoming universal. It’s collaboration mastery. And like Zacks demonstrated, this is a learnable skill. Amateurs with collaboration expertise can outperform experts without it.
Most organizations deploy AI assuming human-in-the-loop is always better. But the data says otherwise. Sometimes the best collaboration is NO collaboration. Sometimes it’s very strategic collaboration based on confidence thresholds and understanding when to synthesize versus when to delegate.
The radiologists failed because they hadn’t learned this skill yet. The Zacks team succeeded because they had.
Tony Stark doesn’t debate with JARVIS over every targeting calculation. He doesn’t second-guess routine diagnostics or let JARVIS’s uncertainty shake his confidence in high-stakes moments. He’s developed an intuition for when to trust, when to override, when to collaborate deeply, and when to work independently. That intuition—knowing when to engage and when to delegate—isn’t magic. It’s not because he’s a genius. It’s a learnable pattern, refined through experience.
The same skill separates teams that succeed with AI from those that struggle. It’s not about having better models or more data. It’s about developing the instinct for collaboration itself.