RLMs are not just subagents

I have been trying to understand the distinction between Recursive Language Models and ordinary subagents. At first glance, the difference sounds semantic. Both systems let one language model call another language model. Both can split work into smaller pieces. Both can use tools.

But that is not the important distinction.

The core difference is not whether the model can call other models. The core difference is where the long context lives, and whether the model can compute over it symbolically.

The normal subagent pattern

A typical subagent setup looks like this:

root model sees prompt and context
root model writes a few natural-language tasks
subagents answer those tasks
root model reads their summaries
root model writes the final answer

That can be useful. It is basically delegation. The root model can say:

Ask one agent to inspect the auth code.
Ask another agent to inspect the billing code.
Ask a third agent to summarize the migration history.

The problem is that the delegation itself is still mostly verbal and sparse. The root model has to write out the subcalls as text. It can only describe so many of them before hitting output limits, latency limits, or context-management problems.

This works well when the task naturally decomposes into a small number of subtasks. It is a weaker fit when the task requires dense processing over a huge input: every line, every document, every pair of records, or every chunk of a long conversation.

The RLM pattern

An RLM looks more like this:

the user prompt is stored outside the model as P
the root model sees metadata about P
the root model writes code against P
that code slices, searches, parses, and transforms P
that code can call an LLM on programmatically selected pieces of P
intermediate results are stored as variables
the final answer is assembled from those variables

In other words, the prompt is not just text stuffed into the model’s context window. It becomes an object in an environment.

The model gets a symbolic handle:

Then it can write code like:

chunks = split_into_sections(P)

facts = []
for chunk in chunks:
    facts.append(rlm(f"Extract evidence about the question from this chunk:\n{chunk}"))

Final = rlm(f"Use these extracted facts to answer the original question:\n{facts}")

That is a very different control structure from “ask three subagents for help.”

The root model did not verbalize every subtask. It wrote a program that generated the subtasks.

What symbolic recursion means

“Symbolic recursion” sounds abstract, but in practice it means something specific:

The model has symbolic references to external state, like P, chunks, summaries, candidate_pairs, and Final.
The model writes executable code that manipulates those references.
That code can call the same model or scaffold on smaller derived prompts.
The results are stored back into symbolic state.
The final answer is composed from that state.

A simple version:

P = huge_user_prompt
chunks = split(P, chunk_size=2000)

summaries = []
for chunk in chunks:
    summaries.append(rlm(f"Summarize the relevant details:\n{chunk}"))

Final = rlm(f"Synthesize the answer from these summaries:\n{summaries}")

A denser version:

records = parse_records(P)

labels = {}
for record in records:
    labels[record.id] = rlm(f"Classify this record:\n{record.text}")

pairs = []
for a in records:
    for b in records:
        if a.id < b.id and condition(labels[a.id], labels[b.id]):
            pairs.append((a.id, b.id))

Final = render_pairs(pairs)

The recursion is that the root RLM calls another model or RLM on smaller inputs. The symbolic part is that the control flow happens through code, variables, loops, arrays, filters, joins, and stored intermediate values.

The model is not just thinking token by token. It is writing a little program that uses language models as semantic operators.

Why this is not just a bigger subagent system

The distinction becomes clearer when the number of needed subcalls grows.

With subagents, a root model might call five helpers. Maybe twenty. But each call is usually a specific natural-language delegation:

Read this section and report back.
Search for mentions of this concept.
Compare these two files.

With an RLM, one short program can produce thousands of model calls:

for line in P.splitlines():
    label = rlm(f"Classify this line:\n{line}")
    labels.append(label)

Or even quadratic work:

for a in items:
    for b in items:
        if should_compare(a, b):
            comparisons.append(rlm(f"Compare these two items:\n{a}\n{b}"))

That sounds expensive until you remember that attention is already expensive. The RLM claim is not “always call a million models.” The claim is that the system should be able to spend semantic compute in the shape the task requires, instead of pretending the whole task must fit into one attention window.

The file-handle analogy

The simplest analogy is a file handle.

A normal long-context model is like pasting an entire file into a chat window and asking the model to reason over it.

A subagent system is like asking a few people to read parts of the file and report back.

An RLM is like giving the model a file handle and saying:

Write a program that reads whatever parts of this file are needed.
Call readers recursively when semantic judgment is needed.
Store the results.
Compute the answer.

That file-handle distinction matters. If the full prompt must be placed into the root model’s context, the system still inherits context limits and context rot. If the prompt lives outside the model as symbolic state, the root can operate on references to data rather than only on tokens currently in its window.

The practical definition

So the clean definition is:

Subagents are a delegation mechanism.
RLMs are a computation model for recursively operating over prompt state.

An RLM may use subcalls. But subcalls alone do not make something an RLM.

To be an RLM, the system needs the unusual part: the model’s own prompt and intermediate horizon are externalized into symbolic state, and the model is forced to understand that state by writing code that recursively applies language-model reasoning where needed.

That is why the phrase “just subagents” misses the point. The novelty is not helper agents. The novelty is making long-context reasoning look less like attention over a giant string and more like a program recursively computing over an external object.