Augmented Language Models: a Survey

A important survey for ALM from Yann LeCunn

This paper — Augmented Language Models: a Survey — studies how to extend language models beyond pure text prediction into systems that think and interact with the world.

Paper: https://arxiv.org/abs/2302.07842

The authors propose that modern LLM systems are evolving toward a new paradigm:

Not just predicting text, but solving tasks.

They call these systems Augmented Language Models (ALMs) — language models enhanced with reasoning abilities and external tool usage.

The survey organizes almost all modern LLM agent research into two core axes:

  1. Reasoning (thinking before answering)
  2. Acting / Tools (interacting with the environment)

Why Augmentation Is Needed

Traditional LLMs optimize next-token prediction:

P(xt∣x<t)P(x_t | x_{<t})P(xt​∣x<t​)

This makes them powerful at language, but weak at:

  • multi-step planning
  • long horizon tasks
  • factual grounding
  • interacting with real systems

ALMs overcome this by integrating non-parametric modules (search, calculators, APIs, interpreters) while still keeping the language modeling objective unchanged.

In other words:

The model remains a language model — but operates as a cognitive system.

Reasoning

Definition

Give more computation steps to the model before yielding the answer to a prompt.

More precisely:

Decompose complex tasks into simpler subtasks with a hierarchical structure.

The Key Idea: Thinking Improves Accuracy

Instead of answering directly:

Standard LM

Q: 17×24?
A: 408 (sometimes wrong)

Reasoning LM

17×24
= 17×(20+4)
= 340 + 68
= 408

By generating intermediate steps, the model converts a single difficult prediction into many easy predictions.

This dramatically improves performance on math, logic, and planning tasks.


Methods Covered in the Survey

Chain-of-Thought Prompting

Models explicitly generate intermediate reasoning steps before the answer.

This helps logical tasks because the model externalizes hidden computation.

Recursive Decomposition

Break a problem into smaller subproblems and solve sequentially.

Self-Improvement / Reflection

Models critique or revise earlier steps.


Key Insight

Reasoning changes the role of inference:

Inference becomes computation.

Instead of scaling model size, we scale thinking time.

Tool & Acting

Definition

Gather external information, or affect the virtual/physical world — importantly observable by the model afterward.

The model interacts with modules such as:

  • search engines
  • code interpreters
  • calculators
  • databases
  • APIs

This is not just retrieval — it is closed-loop interaction.

The survey defines tool use as calling external modules to extend capability beyond the training data.


The Key Idea: Models Should Not Memorize the World

A pure LLM must store knowledge internally.

An ALM instead queries reality:

User: What's the weather in Tokyo?
LM → API → receives data → answers

This turns the model from:

knowledge container → knowledge interface

Acting: LLM as an Agent

When tools affect the environment, the model becomes an agent.

Example loop:

Observe → Think → Act → Observe → Think → Act

This paradigm appears in systems like ReAct, where reasoning traces guide action planning and environment interaction.


Combining Reasoning and Acting

The most powerful systems combine both.

CapabilityReasoningActing
Math
Planning
Web navigation
Autonomous agents

Reasoning decides what to do
Acting executes how to do it

The New View of Language Models

The survey suggests a shift in perspective:

Old paradigm:

LLM = text generator

New paradigm:

LLM = controller of computation and interaction

Augmented Language Models remain trained with next-token prediction, but learn to reason, call tools, and act through prompting or demonstrations.


Why This Paper Matters (Looking Forward)

This survey predicted nearly all modern agent research:

  • tool-calling LLMs
  • RAG systems
  • code interpreter models
  • planning agents
  • web agents
  • reasoning models

Today’s AI assistants are essentially ALMs.


Final Takeaway

The paper’s core message:

Intelligence does not come from the model alone —
it comes from the model orchestrating computation.

Reasoning gives models internal cognition.
Acting gives models external embodiment.

Together they turn a language model into a general problem-solving system.


Reference

Mialon et al., Augmented Language Models: a Survey, 2023
https://arxiv.org/abs/2302.07842