Beyond Operating Systems: LLMs as a New Interface Paradigm
The tech industry's mental models for AI are evolving—from CPUs to operating systems to something entirely new
Large Language Models upended the tech industry's familiar metaphors in 2023. While we initially reached for CPU comparisons to make sense of this new technology, that analogy quickly proved insufficient.
The tech industry's first instinct was natural enough. When OpenAI released their language model API, they seemed to follow the path of infrastructure providers like Twilio. Companies could embed these models into their applications as computational building blocks, and analysts charted model sizes against Moore's Law, drawing neat parallels between parameter counts and transistor density.
But this simple comparison missed something fundamental. In an article I wrote last year, I explored whether LLMs might be more like operating systems than processors. OpenAI's launch of the ChatGPT plugin store pointed toward this new understanding. ChatGPT wasn't just providing computational power—it was reshaping how users accessed applications, much like iOS transformed our relationship with mobile software.
Now, watching Claude navigate web interfaces like a human user, seeing Gemini understand real-time desktop streams, and observing the latest developments from OpenAI, I'm beginning to think even the operating system analogy falls short. We're witnessing the emergence of something new: not just a processor to compute with, or an operating system to manage apps, but a new way of interfacing with technology itself.
This is the story of how our mental models for LLMs have evolved, and why the next chapter might be a new interface paradigm.
The CPU Era: When LLMs Were Computational Building Blocks
Early 2023 saw the entire AI industry organize itself around the CPU metaphor. OpenAI, Anthropic, and Cohere positioned themselves as infrastructure providers, competing on metrics like parameter count and inference speed. Startups flocked to build on these AI "processors," mirroring how software companies in the 1980s built their applications on Intel's x86 architecture.
Nathan Baschez at Every captured this zeitgeist in his March 2023 article "LLMs are the new CPUs":
"Today we are seeing the emergence of a new kind of 'central processor' that defines performance for a wide variety of applications. But instead of performing simple, deterministic operations on 1's and 0's, these new processors take natural language (and now images) as their input and perform intelligent probabilistic reasoning, returning text as an output"
The Operating System Shift: When Apps Met AI
OpenAI's vision extended beyond just providing computational power—they saw themselves as building a new platform that would mediate how users interact with applications. The launch of the ChatGPT plugin store in early 2023 marked a pivotal moment in this evolution. Much like how Apple's App Store transformed the iPhone from a device into a platform, OpenAI envisioned a future where users would discover and interact with AI-powered applications through their GPT store.
The operating system analogy felt compelling. Just as iOS and Android created new ecosystems for mobile apps, OpenAI positioned itself to be the platform through which users would access AI capabilities. The GPT store would be their App Store, complete with reviews, rankings, and a marketplace for developers to distribute their creations. Google followed suit with their own version in Gemini, introducing "Gems" as their take on AI-powered applications.
When I wrote about this shift last year, it seemed like we were watching the birth of a new platform paradigm. The parallels were striking: just as mobile operating systems mediated our relationship with apps, these AI platforms would mediate our interaction with digital services.
But the reality proved more complex. While the GPT store hasn't quite captured user imagination in the way the App Store did, something more interesting has emerged. This December, we've seen a flurry of updates from major AI companies that hint at something beyond both the CPU and operating system models.
Signs of Something New: When AI Learned to See and Act
Google launched Gemini in December 2023, showcasing their first truly multimodal model. Their initial demo video portrayed seamless conversations while drawing, showing objects, and engaging in natural interactions. The demo sparked imaginations but also controversy—Bloomberg later reported that the fluid voice interactions were dubbed afterward, and responses were generated from still images and text prompts. Google's subtle YouTube footnote admitted: "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity."
A year later, those ambitious demos have become reality. Google's Gemini 2.0 Flash now processes real-time video streams through Project Astra, enabling natural conversations while watching your screen or camera feed. OpenAI matched this capability, rolling out real-time video features for ChatGPT Plus, Team, and Pro users. Their Advanced Voice Mode with vision lets users point their phones at objects or share screens, receiving near-instantaneous responses.
These developments signal more than technical progress. They mark a shift in how AI systems engage with our world. These aren't just query-response tools or apps we access through an AI operating system. They're becoming collaborative agents that observe our context, understand our goals, and help us achieve them in real-time.
Emerging New Interface Paradigms
The von Neumann architecture that has dominated computing for decades established clear boundaries: computers receive input through specific channels (keyboard, mouse, camera) and provide output through others (screen, speakers). This architecture shaped not just how computers work, but how we think about human-computer interaction.
We've never had multimodal input and output capability like this before. Previous attempts at ambient computing, like Google Glass in 2013, failed because they could only handle specific, predefined tasks like navigation. Today's AI assistants can understand context and intent across domains.
Facebook's CTO Andrew Bosworth captured this shift in his end-of-year blog:
"We're right at the beginning of the S-curve for this entire product category, and there are endless opportunities ahead. One of the things I'm most excited about for 2025 is the evolution of AI assistants into tools that don't just respond to a prompt when you ask for help but can become a proactive helper as you go about your day"
The race to build this new interface is accelerating. Apple launched its $3,000 Vision Pro. OpenAI partnered with ex-Apple design chief Jon Ivey. Countless startups are exploring their own approaches. But the core innovation isn't just the hardware—it's the AI's ability to understand and respond to human context naturally.
Voice interfaces have already transformed how I interact with AI. I find myself speaking to my devices more naturally, and the responses feel increasingly fluid and contextual. Screen sharing and real-time vision could push these interactions even further. Instead of carefully crafting text prompts, we might simply show AI what we're trying to accomplish. These new interfaces have the potential to understand our intent and context in ways that feel more natural and intuitive than ever before.
This isn't just a new operating system or a new way to distribute apps. It's a reimagining of how humans and computers interact. As I wrap up my last article of 2024, I'm looking ahead to what promises to be a year of experimentation in 2025. The era of explicit commands—click here, type this, swipe that—could give way to something more intuitive, as AI systems learn to perceive and understand our world in increasingly sophisticated ways. There will be plenty to watch, analyze, and write about as this new interface paradigm takes shape.