GPT

SIGIR-Style Background: Retrieval-Augmented & Augmented Language Models

Paper summary from Sigir 2023

Orlando Ding

14 Feb 2026 • 1 min read

Although the following two papers are not both published at SIGIR 2023, they form the conceptual foundation behind the SIGIR 2023 research trend:

Search systems are shifting from document retrieval → reasoning systems powered by retrieval-augmented language models.

We briefly summarize them and explain their connection.

Paper 1 — Augmented Language Models: a Survey (2023)

Paper link: https://arxiv.org/abs/2302.07842

Core Idea

The paper defines Augmented Language Models (ALMs):

Language models enhanced with external computation and interaction modules.

Two capability axes:

Axis	Meaning
Reasoning	internal multi-step thinking
Acting	interacting with tools / environment

Instead of storing all knowledge in parameters, the model becomes a controller of computation.

Architectural View

[
\text{ALM} = \text{LM} + \text{Memory} + \text{Tools} + \text{Planning}
]

This shifts the role of LLMs from:

predicting text → solving tasks

Implication for Retrieval

Traditional IR pipeline:

query → retrieve → rank → return document

ALM pipeline:

query → plan → retrieve → reason → act → refine → answer

Retrieval is no longer the final product — it becomes a step inside cognition.

Paper 2 — REALM: Retrieval-Augmented Language Model Pre-Training (ICML 2020)

Paper link: https://dl.acm.org/doi/pdf/10.5555/3524938.3525306

Core Idea

REALM integrates retrieval directly into pretraining:

[
P(y|x) = \sum_{d \in \text{documents}} P(y|x,d)P(d|x)
]

Instead of memorizing facts, the model learns to retrieve evidence during training.

Key Contribution

REALM introduced:

Differentiable retrieval during pretraining.

The model jointly learns:

language understanding
search behavior

Impact on Information Retrieval

Before REALM:

Retrieval supports models.

After REALM:

Models learn retrieval as a skill.

This becomes the origin of modern RAG systems.

Conceptual Connection

Stage	Retrieval role
Classic IR	final output
REALM	latent memory
ALM	reasoning tool

Evolution:

[
\text{Search Engine} \rightarrow \text{Neural Retriever} \rightarrow \text{Cognitive Module}
]

Why This Became a SIGIR-Era Direction

Traditional SIGIR evaluation:

[
NDCG, MRR, Recall
]

LLM-era evaluation:

[
Task\ Success
]

Retrieval quality is no longer measured only by relevance —
but by whether it helps reasoning succeed.

Key Insight

REALM teaches models where knowledge is.
ALM teaches models when to use knowledge.

Together they establish the modern paradigm:

Retrieval is not answering questions.
Retrieval is enabling reasoning.

One-Sentence Takeaway

REALM turns retrieval into learnable memory, while ALM turns retrieval into a decision-making action — modern SIGIR research sits between them.

Paper 1 — Augmented Language Models: a Survey (2023)

Core Idea

Architectural View

Implication for Retrieval

Paper 2 — REALM: Retrieval-Augmented Language Model Pre-Training (ICML 2020)

Core Idea

Key Contribution

Impact on Information Retrieval

Conceptual Connection

Why This Became a SIGIR-Era Direction

Key Insight

One-Sentence Takeaway

Sign up for more