Beyond the Training Set: Empowering LLMs to Seek Knowledge
This article explores the limitations of current AI knowledge systems and makes a case for a shift to agentic models that can retrieve their own external information.
Back in the 2nd century AD, under the ink-black canopy of the Alexandrian night, Claudius Ptolemy gazed upwards with a mind as vast as the cosmos itself. Having immersed himself in the works of Aristotle, with the foundational belief in an Earth-centered universe, Ptolemy was driven by a singular ambition: to devise a celestial model that could elegantly predict the heavenly bodies. Unfortunately for him, not all that he saw in the dark sky followed steady paths around the Earth like the Sun and the Moon. Mars, for instance, would startlingly stop and reverse its direction. Ptolemy's model was a complex but captivating system of special adjustments to account for these erratic paths. As time passed and the observations became more precise, the model failed. The astronomers who followed in Ptolemy's footsteps continued to tweak and fix the gaps by adding even more complicated adjustments until the 16th-century Polish mathematician, astronomer, and all-around polymath Copernicus proposed his heliocentric model.
Just as Ptolemy desperately tweaked his geocentric model to account for anomalies, a similar effort is underway to address limitations in the prevailing implementation pattern for generative AI. With large language models (LLMs), we face a unique challenge. Despite their vast knowledge, these models are only trained up to a point in time. Their text generation ability is constrained by a finite 'context window' - the stretch of text they can consider at any moment.
This means LLMs can generate text based on their training data, but struggle to incorporate new information outside their training. Enter Retrieval Augmented Generation (RAG), designed to bridge this gap by fetching relevant external data to aid the LLM.
However, as RAG has been applied to more complex use cases, limitations have emerged. Much like Ptolemy complicating his model with epicycles, engineers have added layers of complexity to RAG systems.
In this article, we'll explore the origins of RAG, its limitations, and ongoing efforts to expand its capabilities. As we trace this evolution, we'll consider whether a future LLM-centric approach - where the LLM directs retrieval - might offer a solution akin to the shift from an Earth-centered to a Sun-centered solar system model.
RAG: Augmenting LLMs with External Knowledge
Retrieval-augmented generation (RAG) represents a step forward in using large language models (LLMs), addressing a key limitation: the knowledge cut-off date. LLMs are trained on vast datasets up to a certain point in time. When you ask them a question about information that may be beyond their training window, they can't respond.
One way to overcome this problem is to include the information directly in the prompt, as seen below. This works when the amount of information can fit within the LLM's context window (the maximum size of the prompt).
Recognizing this challenge, researchers at Meta, in 2020, proposed a solution. Their paper, "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," laid the foundation for RAG. This system dynamically retrieves relevant data from external sources and integrates it into the call to the LLM.
However, a simple text search could be inadequate. For instance, a user might search for "king," while the data might use the word "emperor." So RAG systems utilize a semantic search that understands and matches the meaning behind the words rather than exact keyword matches.
As engineers started to use LLMs across a wider array of use cases, the limitations of early RAG architectures became clear. Turns out RAG isn't great for structured data. For example, asking, "What year in table 2 has the least profit?" would stump the system. With each stumble came new modifications and additions. When a simple semantic search returns too many results, a ranker could prioritize the best match. If the content size gets too large a summarizer could produce a shorter version. Bit by bit, these systems evolved from the so-called "Naive" RAG to their more sophisticated "Advanced" and "Modular" descendants.
The current mental model for RAG is like an unsophisticated AI scouting relevant data for the more capable AI (LLM). In this model, the RAG system is the center of our universe, and the LLM (Sun) orbits around it like a satellite.
This happens because LLMs are treated as black boxes in software systems. Their limitations are managed by building complex infrastructure around them, which engineers like to do.
What if LLMs were the Sun?
While engineers have been busy constructing intricate RAG systems to feed data to LLMs, model developers have explored methods for the LLMs to retrieve their own information. At OpenAI's November 2023 demo day, they unveiled a beta version of this capability in the GPT-4 Assistant API. It equips GPT-4 with two new skills - calling APIs and retrieving data from attached files up to 512MB. Though currently limited, this signals a shift towards LLMs directing their own data retrieval.
Google has demoed prompting techniques like ReACT with RAG, enabling the LLM itself to decide what information to fetch. LlamaIndex, a data framework for LLM apps, also provides guides for an "agentic" approach where the model actively drives retrieval decisions to answer user questions.
Rather than relying on complex external scaffolding, enabling LLMs to self-retrieve data leverages their innate capabilities. This agentic approach is not without challenges, as LLMs can hallucinate inaccurate information without the error-reduction of RAG systems. Just a year ago, function calling required extensive scaffolding around LLMs. Now, it is being built directly into models. Similarly, embedding retrieval skills into LLMs could minimize the need for intricate augmentation frameworks.
In retrospect, it seems only natural that the Sun, not Earth, sits at the heart of our solar system. Perhaps someday soon, the notion of LLMs passively receiving data will seem equally antiquated. By empowering LLMs to take agency in gathering knowledge, we can build AI systems as Copernicus would have designed them - with machine learning models actively revolving around expansive troves of knowledge, not the other way around.