Augmented Language Models: a Survey
A important survey for ALM from Yann LeCunn
This paper — Augmented Language Models: a Survey — studies how to extend language models beyond pure text prediction into systems that think and interact with the world.
Paper: https://arxiv.org/abs/2302.07842
The authors propose that modern LLM systems are evolving toward a new paradigm:
Not just predicting text, but solving tasks.
They call these systems Augmented Language Models (ALMs) — language models enhanced with reasoning abilities and external tool usage.
The survey organizes almost all modern LLM agent research into two core axes:
- Reasoning (thinking before answering)
- Acting / Tools (interacting with the environment)
Why Augmentation Is Needed
Traditional LLMs optimize next-token prediction:
P(xt∣x<t)P(x_t | x_{<t})P(xt∣x<t)
This makes them powerful at language, but weak at:
- multi-step planning
- long horizon tasks
- factual grounding
- interacting with real systems
ALMs overcome this by integrating non-parametric modules (search, calculators, APIs, interpreters) while still keeping the language modeling objective unchanged.
In other words:
The model remains a language model — but operates as a cognitive system.
Reasoning
Definition
Give more computation steps to the model before yielding the answer to a prompt.
More precisely:
Decompose complex tasks into simpler subtasks with a hierarchical structure.
The Key Idea: Thinking Improves Accuracy
Instead of answering directly:
Standard LM
Q: 17×24?
A: 408 (sometimes wrong)
Reasoning LM
17×24
= 17×(20+4)
= 340 + 68
= 408
By generating intermediate steps, the model converts a single difficult prediction into many easy predictions.
This dramatically improves performance on math, logic, and planning tasks.
Methods Covered in the Survey
Chain-of-Thought Prompting
Models explicitly generate intermediate reasoning steps before the answer.
This helps logical tasks because the model externalizes hidden computation.
Recursive Decomposition
Break a problem into smaller subproblems and solve sequentially.
Self-Improvement / Reflection
Models critique or revise earlier steps.
Key Insight
Reasoning changes the role of inference:
Inference becomes computation.
Instead of scaling model size, we scale thinking time.
Tool & Acting
Definition
Gather external information, or affect the virtual/physical world — importantly observable by the model afterward.
The model interacts with modules such as:
- search engines
- code interpreters
- calculators
- databases
- APIs
This is not just retrieval — it is closed-loop interaction.
The survey defines tool use as calling external modules to extend capability beyond the training data.
The Key Idea: Models Should Not Memorize the World
A pure LLM must store knowledge internally.
An ALM instead queries reality:
User: What's the weather in Tokyo?
LM → API → receives data → answers
This turns the model from:
knowledge container → knowledge interface
Acting: LLM as an Agent
When tools affect the environment, the model becomes an agent.
Example loop:
Observe → Think → Act → Observe → Think → Act
This paradigm appears in systems like ReAct, where reasoning traces guide action planning and environment interaction.
Combining Reasoning and Acting
The most powerful systems combine both.
| Capability | Reasoning | Acting |
|---|---|---|
| Math | ✓ | |
| Planning | ✓ | ✓ |
| Web navigation | ✓ | |
| Autonomous agents | ✓ | ✓ |
Reasoning decides what to do
Acting executes how to do it
The New View of Language Models
The survey suggests a shift in perspective:
Old paradigm:
LLM = text generator
New paradigm:
LLM = controller of computation and interaction
Augmented Language Models remain trained with next-token prediction, but learn to reason, call tools, and act through prompting or demonstrations.
Why This Paper Matters (Looking Forward)
This survey predicted nearly all modern agent research:
- tool-calling LLMs
- RAG systems
- code interpreter models
- planning agents
- web agents
- reasoning models
Today’s AI assistants are essentially ALMs.
Final Takeaway
The paper’s core message:
Intelligence does not come from the model alone —
it comes from the model orchestrating computation.
Reasoning gives models internal cognition.
Acting gives models external embodiment.
Together they turn a language model into a general problem-solving system.
Reference
Mialon et al., Augmented Language Models: a Survey, 2023
https://arxiv.org/abs/2302.07842