Beyond Autocomplete: Guiding LLMs to Deeper Reasoning
LLMs are like autocomplete on steroids—here's how to make them complete smarter.
"All GPT-3 has to do is to complete this. All it really does is predict the next word, it's autocomplete on steroids." - Gary Marcus, 2021
Gary Marcus knows what he's talking about. As a professor of psychology and neural science at New York University, founder of AI companies Geometric Intelligence (acquired by Uber) and Robust.AI, and author of influential books like "Rebooting AI" and "The Algebraic Mind," Marcus has been a prominent voice in the AI discourse for decades. While his assessment of GPT-3 might seem to point to a limitation, it actually offers an insight into the incredible power of Large Language Models (LLMs) – if we can get them to autocomplete the right problems. This apparent simplicity – predicting the next token in a sequence – belies a complex system capable of feats of language understanding and generation. But to truly harness this power, we need to understand how to guide these models, to make them think before they speak. In this article, we'll explore how techniques such as priming, providing examples, and chain-of-thought prompting can improve LLM outputs, transforming these 'autocomplete on steroids' systems into powerful reasoning engines capable of tackling problems across a wide range of domains.
Autocomplete - The Foundation of Large Language Models
At their heart, Large Language Models (LLMs) are built on a simple principle: predicting the next token in a sequence. This concept, familiar to anyone who's used a smartphone's text prediction feature, has been dramatically enhanced by advanced neural network architectures, particularly the attention transformer.
To understand this, let's consider a basic example:
Imagine you're typing the sentence: "The cat sat on the..."
A simple autocomplete system might suggest "mat" or "chair" based on common phrases. However, an LLM does something more sophisticated. It doesn't just look at the immediate preceding words; it considers the entire context provided.
The attention transformer mechanism, which powers these models, allows the LLM to weigh the importance of each word in relation to all others. It can "pay attention" to relevant parts of the input, no matter how far back they appear. In our example, it might consider:
The subject ("cat")
The action ("sat")
The preposition ("on")
And even broader context if available (Is this a story about a luxurious home? A barnyard tale?)
This contextual understanding is what elevates LLMs from simple word prediction to seeming comprehension of language. The model isn't just completing a sentence; it's attempting to continue the narrative or discussion.
What's remarkable is that this next-word prediction mechanism, when scaled up to billions of parameters trained on vast amounts of text, results in models that can engage in complex tasks like answering questions, writing essays, or even coding.
Understanding this mechanism is important because it informs how we can best interact with these models. By providing clear context, asking for step-by-step reasoning, and framing our prompts effectively, we can guide the model's prediction process, leading to more accurate and useful outputs.
Priming the Model - Setting the Stage for Intelligent Responses
Interacting with Large Language Models effectively often comes down to how well we prime them. To understand this concept, consider a simple analogy: visiting a library.
When you ask a librarian for a "book about Texas," they'll likely respond with clarifying questions:
"Are you planning a trip to Texas?"
"Are you researching Texas history?"
"Are you interested in Texas cuisine?"
Each question helps the librarian narrow down the vast array of "Texas books" to find the one that best suits your needs. This interaction mirrors how we can effectively communicate with LLMs.
In the world of LLMs, priming often takes the form of specific instructions or role assignments at the beginning of our prompts:
"Act as a marketing expert analyzing a new product launch."
"You are a professional journalist writing a balanced news article."
"Assume the role of a financial advisor providing investment strategies."
These priming statements set the tone, context, and expected expertise for the model's responses. They're akin to telling our librarian, "I'm a history professor researching the Texas Revolution," which immediately orients them towards academic historical texts.
Priming can also set the style, format, or ethical framework:
"Provide your answer in the style of a Shakespearean sonnet."
"Structure your response as a formal business proposal."
"Approach this question from an environmentally conscious perspective."
By priming effectively, we're not just asking for information - we're shaping how that information is processed and presented. This technique allows us to harness the vast knowledge embedded in LLMs and channel it in specific, useful directions.
Providing Examples - Showing the Model What You Want
When working with Large Language Models, sometimes the best way to get precise outputs is to show rather than tell. Providing clear examples of the desired output format can improve the model's ability to deliver exactly what you need. This technique is particularly useful for tasks that require specific formatting or information extraction.
In this section, we'll explore using a travel information extraction task as our case study. The following prompt demonstrates how to combine role assignment, specific instructions, and a detailed example to guide the model's output:
Act as my note taker. I will give you text from travel emails I receive (hotel, car, airplane reservations and confirmations). I need your help extracting the important information and formatting it in a markdown format so I can add it to my notebook.
Here is an example of what I'd like
"""
### 🏨 Nov 23-24, 2023: Malaga - Líbere Málaga Teatro Romano
- Duplex studio in Málaga Teatro Romano
- **Address:** Calle Granada, 76, 29015, Málaga
- **Contact:** [Leave blank if not available]
- **Arrival:** 3 PM Thu 2023-11-23
- **Departure:** 11 AM Fri 2023-11-24
- **Cost:** €180.31
- **Notes:** [Include cancellation instructions, late arrival information, etc]
"""
- I want the dates, name of city etc easy to scan
- I want the confirmation code, website, email, phone number easy to find.
Here is the text:
By providing such a detailed example, we give the model a clear template to follow, ensuring that the extracted information is consistently formatted and easy to read.
Asking It to Think First - The Power of Chain-of-Thought Prompting
When interacting with Large Language Models (LLMs), the order in which we present our prompts can significantly impact the quality of the responses we receive. A key insight in this area comes from the concept of "Chain-of-Thought" prompting, a technique that has shown improvements in the reasoning capabilities of LLMs.
The principle is simple yet powerful: instead of asking the model to immediately provide an answer, we first encourage it to explain its reasoning. This approach is encapsulated in the difference between these two prompt structures:
"State the answer and then explain your reasoning."
"Explain your reasoning and then state the answer."
The second approach consistently produces better results. Why? Because it causes the model to output its reasoning process, which then becomes part of the context for answering the question. This additional context allows the model to refine its thinking and arrive at more accurate conclusions.
This technique was first discovered and formally described by researchers at Google Brain in their paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (Wei et al., 2022).
The simplest manifestation of this technique is to add an instruction like "first think step by step" to your prompts. This small addition can lead to significant improvements in the model's problem-solving abilities.
The power of this approach has not gone unnoticed by major AI research labs. OpenAI, for instance, has incorporated this concept into the design of their new o1 model series. According to their announcement, the o1 models are specifically designed to "spend more time thinking through problems before they respond, much like a person would."
The results are striking. In a qualifying exam for the International Mathematics Olympiad (IMO), while GPT-4o correctly solved only 13% of problems, the new reasoning model scored an impressive 83%. This dramatic improvement underscores the potential of chain-of-thought approaches in enhancing the problem-solving capabilities of AI models.
Leveraging Chain-of-Thought in Our Prompts
The concept of chain-of-thought prompting can be a powerful tool in our interactions with AI models. By encouraging the model to think explicitly before answering, we can often elicit more thoughtful, comprehensive, and accurate responses. Let's explore how we can apply this concept to improve our prompts.
The Traditional Approach
Traditionally, when asking an AI to summarize an article, we might use a straightforward prompt like this:
Summarize this article: https://blog.boxcars.ai/p/the-great-and-powerful-ai-unveiling
While this approach can produce a serviceable summary, it often results in a somewhat bland and superficial overview of the content. Here's an example of what such a summary might look like:
Thank you for providing the article. I'll now summarize the key points of "The Great and Powerful AI: Unveiling the Wizard Behind the Curtain" by Tabrez Syed.
Summary:
1. Introduction:
- The article draws a parallel between the Wizard of Oz and modern AI systems, suggesting that both present a façade of omniscience but have limitations behind the curtain.
- It traces the evolution of AI from simple next-token predictors to more sophisticated chatbots, highlighting the challenges in creating truly intelligent AI teammates.
2. The Humble Beginnings: GPT as a Next Token Predictor
- Explains the core function of transformers in Large Language Models (LLMs) as predicting the next token in a sequence.
- Outlines the progression from GPT-1 (117 million parameters) to GPT-3 (175 billion parameters).
- Early interactions required careful prompt engineering and were essentially "single-turn" and stateless.
... continues...
The Chain-of-Thought Approach
To leverage the power of chain-of-thought prompting, we can restructure our request to encourage the model to think more deeply about the content before summarizing. Here's an example of how we might do this:
1.) Analyze the article and generate 5 essential questions that, when answered, capture the main points and core meaning of the text.
2.) When formulating your questions:
a. Address the central theme or argument
b. Identify key supporting ideas
c. Highlight important facts or evidence
d. Reveal the author's purpose or perspective
e. Explore any significant implications or conclusions.
3.) Answer all of your generated questions one-by-one in detail.
This prompt structure encourages the AI to think more deeply about the content and provide a more comprehensive analysis. Here's an example of what the output might look like:
Certainly! I'll generate 5 essential questions based on the article and then answer them in detail.
1. How does the article compare modern AI systems to the Wizard of Oz, and what does this analogy reveal about the current state of AI?
2. What is the fundamental mechanism behind GPT models, and how has this evolved from early versions to ChatGPT?
3. What are the key challenges in creating AI agents capable of handling complex, multi-turn interactions in real-world scenarios?
4. How does the Berkeley Function-Calling Leaderboard (BFCL) V3 benchmark assess AI models, and what do the results reveal about current AI capabilities?
5. What does the author suggest is the next crucial step in AI development, and why is this important?
Now, I'll answer these questions in detail:
[snip..]
3. What are the key challenges in creating AI agents capable of handling complex, multi-turn interactions in real-world scenarios?
The article highlights several key challenges in creating AI agents for complex, real-world scenarios:
1. Maintaining context: While models like ChatGPT appear to maintain context across a conversation, they're still fundamentally stateless. Each response is generated based on the entire conversation history provided, which can lead to inconsistencies in longer interactions.
2. Task management: AI models lack a true state machine or control structure, making it difficult for them to reliably keep track of complex tasks with multiple steps or requirements.
[snip..]
The Results
When using this chain-of-thought approach, the resulting summary tends to be more insightful, detailed, and valuable. It not only captures the main points of the article but also provides a deeper analysis of its significance and implications. The question-and-answer format naturally leads to a more coherent and logically structured summary, offering a richer understanding of the content.
This principle can be applied to various types of prompts, not just summarization tasks. Whether you're asking for analysis, problem-solving, or creative writing, encouraging the AI to "think first" can lead to more thoughtful and comprehensive responses, ultimately helping us harness the full potential of AI models for deeper insights and more nuanced understanding of complex topics.
The Power of Guided Autocomplete
Gary Marcus is right. Large Language Models are, at their core, just sophisticated autocomplete systems. And in the same vein, all computer software is ultimately just 1s and 0s. But just as those two simple digits have revolutionized our world through complex software systems, LLMs have shown us the incredible potential of "autocomplete on steroids" when guided effectively.
By priming the model, providing clear examples, and using chain-of-thought prompting, we can transform simple word prediction into a powerful tool for problem-solving, analysis, and creativity.
It's incredible what you can do with autocomplete if you know what to put first.