HRM-Text: A 1B Reasoning Model Trained for $1,500 — And Why It Matters for Local AI

This one isn't just another "small model wins" story.

Sapient Intelligence's HRM-Text — a roughly 1B parameter language model trained from scratch for about $1,500 — has put a new name at the center of next-generation reasoning architecture discussions: HRM (Hierarchical Reasoning Model).

HuggingFace co-founder and CEO Clem Delangue personally reshared it. Turing Award laureate Yoshua Bengio is a co-author on a related new paper, GRAM (Generative Recursive Reasoning), that travels the same latent recursive reasoning road.

And HRM-Text isn't distillation, isn't fine-tuning, and isn't a wrapper on top of an existing large model. It's a from-scratch pre-trained model that anyone can download, inspect, and run.

If you only look at the parameter count, it sounds like a familiar story. But the interesting thing about HRM-Text is not that it's small or cheap. It's that behind it, the HRM architecture is asking a deeper question:

Does a model need to memorize the world — or does it need to learn how to think, how to look things up, how to verify, and how to act?

For the past few years, the default answer in the LLM industry has been simple: more parameters, more data, longer training, longer context.

HRM takes a different road.

Instead of turning the model into a larger and larger knowledge warehouse, it tries to turn the model into a stronger reasoning core. A standard LLM is like a student carrying a whole library on their back. HRM is more like a person who knows how to solve problems, look things up, review their work, and take action.

The Numbers That Made the Industry Pay Attention

A roughly 1B-parameter model hits:

Benchmark	HRM-Text
MATH	56.2
GSM8K	84.5
ARC-Challenge	81.9
DROP	82.2

Training cost: about $1,500. Run on 16 H100s for under two days.

No post-training. No RLHF. No dependence on explicit chain-of-thought data. The team released the paper, the model weights, and the pre-training code.

That last point matters: HRM-Text is not a wrapper on top of an existing large model. It validates, at the foundation pre-training stage, a new architectural direction.

It's not "another small model upset." More precisely, it's a brain-replacement experiment for reasoning models:

Instead of letting the model speak more chain-of-thought, let it finish thinking inside its head before it opens its mouth.

And that same direction has now shown up at the highest level of academic discussion. Around the release of HRM-Text, a new paper co-authored by Yoshua Bengio — GRAM (Generative Recursive Reasoning) — proposed a structure that heavily reuses the HRM hierarchical recursive skeleton: a high-level state, a low-level state, a dual time-scale, multi-round recursive updates, and on top of that, a probabilistic generative module.

Sapient didn't wait for the industry to give an answer. It put a key question on the table first — and shipped a runnable, open-source, verifiable model system to back it up.

So the real question is no longer:

"Why does a 1B model hit these benchmarks?"

It's:

Did Sapient just validate, in advance, a new architecture line that next-generation reasoning models should take seriously?

Knowledge Is Not Intelligence, and CoT Is Not Thinking

Most of today's reasoning models "think while they speak." Chain-of-Thought turns the reasoning process into a string of tokens, and the model outputs intermediate steps one at a time.

This is useful, but the problems are obvious:

Tokens get longer, bills get larger.
If one intermediate step is wrong, the rest of the chain can fail.
Reasoning is bound to the surface of language — the model can easily learn "text that looks like reasoning" without actually mastering the structure of reasoning.

HRM asks a more radical question: why does reasoning have to be written out at all?

When humans solve a hard problem, we don't speak every inner step out loud. We try, correct, eliminate, backtrack — and only at the end do we say the answer. HRM tries to do the same thing: take the scratchpad out of the model's mouth, and put it back inside the model's head.

This is latent reasoning. The model doesn't output a longer chain-of-thought. Before it outputs anything, it completes multiple rounds of computation in its internal state.

This is also why Sapient has been betting on HRM from the start. Sapient's bet has never been "small models." It's been on HRM (Hierarchical Reasoning Model), a layered reasoning architecture.

While most teams are still tweaking parameters, data, and training tricks on top of the Transformer, Sapient pushed the question down a level:

If intelligence is not only a function of scale, but of how computation itself is organized — should the model architecture itself be redesigned?

The core idea of HRM is to let the model do multiple, layered, recursive state updates in latent space before it emits a single token.

From Symbolic to Text: How HRM Got Here

In 2025, Sapient released HRM-Symbolic.

This model was aimed at closed, verifiable, hard-reasoning tasks: Sudoku, mazes, ARC-AGI. These problems have clear rules, clear state spaces, and verifiable answers. They demand combinatorial search and multi-step reasoning.

So they were the right testbed for the first question: can a hierarchical-recursive-reasoning architecture actually work?

A 27M-parameter model with no pre-training, no CoT data, and only about 1,000 training samples posted strong results on Sudoku-Extreme, Maze-Hard, and ARC-AGI.

That answered one question:

For closed, verifiable, hard-reasoning tasks — does the HRM direction work?

Answer: yes.

But that's not enough, because Sudoku isn't language, and a maze isn't the open world. So HRM-Text answers the second, harder question:

When the task moves into natural language, does HRM still work?

This is harder than simply scaling the model up. Language is open, ambiguous, and dense with knowledge. Output forms are flexible, and training is much more prone to instability.

So HRM-Text isn't "HRM-Symbolic scaled up a bit." It validates whether hierarchical recursive reasoning can enter the foundation language model itself.

From HRM-Symbolic to HRM-Text, Sapient didn't just ship a model. It shipped a continuous technical arc:

First validate the architectural hypothesis on closed reasoning tasks. Then extend that architecture into open language environments. And release the paper, code, weights, and training method together, so the line is reproducible, falsifiable, comparable, and continueable.

This is also why Sapient deserves to be taken more seriously. It didn't wait for the industry to give an answer and then follow along. It put the question on the table first, and pushed a direction that might otherwise have stayed in theoretical discussions into a runnable, open-source, verifiable model system.

The Core of HRM: Two Brain Regions Growing Inside the Model

A standard Transformer is more like an assembly line. Input goes in, gets processed layer by layer, and comes out the other end. The obvious way to add capacity is to add layers, parameters, and training data.

HRM thinks differently. Inside the model, it places two modules that run at different rhythms:

A high-level module H — the strategic brain. It updates slowly, holds long-term context, and decides where to think next.
A low-level module L — the execution brain. It updates quickly, handles local computation, refines details, and pushes the problem forward step by step.

The key point: H and L are not two external agents, and not two models talking to each other. They live inside the same neural network, in the same latent space, repeatedly updating the same internal state. That's the difference between HRM and ordinary "multi-agent in a trench coat."

Most multi-agent systems are a few LLMs chatting in natural language. HRM does its hierarchical recursive computation inside the model.

The metaphor: a standard Transformer is like an article handed to 30 editors in sequence, each of whom edits it once. HRM is more like two groups of editors iterating on the same draft: one group polishes details quickly, the other one holds the big picture slowly. By the time the model outputs, the draft has been revised many times internally.

This is also HRM-Text's biggest difference from a regular small model:

It doesn't only get its capability from parameter count. It makes its limited parameters participate in deeper effective computation.

HuggingFace's model card describes HRM-Text the same way: an H/L dual time-scale recursive architecture — a slow high level, a fast low level — iterating on the same input embedding, squeezing deeper effective computation out of a limited parameter budget.

In other words, HRM-Text isn't bolting a planner onto the outside of a model. It bakes the layered recursive computation into the model itself.

What changes is how the model computes.

Parameters don't blow up. The computation goes deeper. It's like a person who doesn't carry more books, but learns to turn things over in their head a few more times.

What HRM-Text Actually Got Right

If we describe HRM-Text too technically, it turns into a paper summary. The real wins can be put in three sentences.

First, it changed how the model computes.

HRM-Text doesn't just stack more layers. It lets the model do multiple rounds of internal recursive computation before output. Parameters don't blow up. The computation goes deeper.

Second, it changed what the model learns from.

Most language models, during training, predict every token in the whole text sequence — the question, the prompt, the context, the answer. HRM-Text trains from scratch on instruction-response data, but computes the loss only on the response.

This doesn't mean the instruction is ignored. The instruction still participates in attention as context, and the response-side loss still back-propagates into how the model reads the instruction. But the model is no longer asked to learn to "predict the question itself." The training signal is concentrated on generating answers and completing tasks.

The intuition: a teacher grading an exam doesn't give points for copying the question down. They grade whether you answered correctly. The training signal lands on task completion, not on the average of every token in the text.

This is paired with a PrefixLM attention mask. The instruction side can fully integrate context; the response side is generated in causal order. The net effect: a decoder-only model gets a result that approximates an encoder-decoder setup.

The point isn't the cute trick of "predicting fewer tokens." It's that the training signal is redistributed. The model concentrates on how to complete the task, instead of learning the whole text sequence evenly.

Third, it solved the instability problem of recursive training.

Recursive architectures aren't new.

The hard part is that the deeper the loop, the easier training collapses. When the same set of modules is called repeatedly, activation variance can accumulate, and gradients can vanish or explode.

HRM-Text introduces MagicNorm and warmup deep credit assignment, so the model keeps activations stable across many recursive rounds, and credit assignment deepens gradually.

Plainly: don't make the model responsible for the deepest recursive steps on day one. First let it learn the internal computation on short paths, then slowly push the responsibility into deeper reasoning.

This shows that HRM-Text isn't "just run the same layer a few more times." It systematically addresses the question of how recursive computation enters a language model.

These three pieces are what HRM-Text is really about:

The architecture owns how the model thinks.
The objective owns what the model learns.
The training method owns how deep it can go without collapsing.

So HRM-Text isn't a single trick. It's a new foundation-model design method, in which internal computation depth, task-completion objective, and stable recursive training are designed together as one system.

After stacking the changes, the numbers move:

ARC-Challenge: 51.9 → 81.9
MATH: 35.4 → 56.2
GSM8K: 48.4 → 84.5

The gains don't come from a single trick. They come from architecture, training objective, and training method acting together.

What HRM-Text really got right is putting "how to compute", "what to learn", and "how to train stably" back on the same design table.

That is also the biggest gap between Sapient's line and the ordinary "small model" line:

It's not just about making the model smaller. It's about redefining how a limited set of parameters participates in deeper internal computation.

In token terms, HRM-Text was trained on only about 40B unique tokens. After de-duplication accounting, the published experimental total is around 60B tokens.

For comparison: Llama 3.2 3B used about 9T tokens — about 225× more. Qwen3 2B used about 36T tokens — about 900× more.

And yet, on several reasoning-heavy benchmarks, HRM-Text can sit on the same comparison table as a set of mainstream 2B–7B open-source models.

That is the truly unusual part of HRM-Text:

It doesn't push the old line forward by adding parameters, training length, and data. It uses a new computational structure to pull the effective computation depth of limited parameters back up.

The Bigger Signal: Bengio's Team Is Walking the Same Road

Around the release of HRM-Text, another signal worth paying attention to: Yoshua Bengio is a co-author on Generative Recursive Reasoning Models, or GRAM.

That paper doesn't keep stacking scale on top of the standard Transformer. It puts recursive reasoning, latent reasoning, and generative modeling together.

More precisely, GRAM doesn't just broadly "go in a similar direction." At the level of the core computational skeleton, it heavily reuses HRM's design.

If you compare the two structures, the most critical elements of HRM all have a direct counterpart in GRAM.

1. High-level state. HRM has the high-level module H, which holds a slower, more stable, more global semantic state. GRAM has a high-level latent state / high-level recurrent state that models higher-level reasoning state.

2. Low-level state. HRM has the low-level module L, which updates fast and handles local computation and detail state. GRAM has a low-level latent state / low-level recurrent state for fine-grained recursive updates.

3. Dual time-scale. The core of HRM is the H/L dual time-scale: the low-level module updates many times, the high-level module updates more slowly. GRAM also uses recursive interaction between high and low states, forming layered, multi-step internal computation.

4. Latent-space recursion. HRM doesn't complete reasoning through an external text chain; it repeatedly updates internal state in latent space. GRAM also runs reasoning recursively in latent space, instead of relying on explicit text CoT.

5. Internal computation before output. HRM emphasizes that the model runs multiple rounds of internal computation before output. GRAM also emphasizes recursive reasoning — the model forms deeper reasoning through recursive state updates before generating.

GRAM is not a clean-slate re-invention. Strip away the probabilistic generative module GRAM adds on the outside, and its underlying computational logic overlaps heavily with HRM: high-level state, low-level state, latent-space recursion, multi-round internal updates.

This is not "broadly similar direction." It is strong consistency at the level of the core architectural hypothesis.

GRAM is not just repeating HRM either. On top of HRM's deterministic recursive skeleton, it adds prior, posterior, and decoder modules — turning the original hierarchical recursive reasoning into a probabilistic, multi-trajectory generative reasoning framework.

If HRM proposed and validated the "high-low dual time-scale recursive reasoning" line first, GRAM is a generative probabilistic wrapper on that same skeleton — letting the model sample across multiple potential reasoning trajectories.

That's exactly why GRAM, instead of diluting HRM, makes HRM's importance more obvious. It doesn't sidestep HRM and start over. It keeps building on the hierarchical recursive skeleton HRM already proposed and validated.

Sapient didn't just join the discussion about next-generation reasoning models. It shipped the basic structure that top researchers are now reusing and extending.

Seen this way, HRM is no longer just the name of a model architecture. It is starting to become a reference point in next-generation reasoning model research.

So Sapient's place shouldn't be written as "the small-model team that Bengio liked." A more accurate description is:

Sapient first turned HRM, a hierarchical recursive reasoning architecture, into a runnable, open-source, verifiable model system. GRAM, with Bengio as a co-author, shows that this architectural idea has been noticed by top AI researchers worldwide, and is being quickly absorbed into the research framework for next-generation reasoning models.

From this angle, the meaning of HRM-Text is no longer "a 1B model that scored well." Sapient has, in advance, called the architectural line that top-tier research is now following.

It's not an isolated small model. It's an early signal:

AI reasoning is shifting from "writing out the chain of thought" to "forming internal thought structures."

Next-generation reasoning models shouldn't only output longer text chains. They should do deeper internal computation in latent space.

HRM's contribution is that it first turned "high-low dual time-scale recursive reasoning" into a runnable, open-source, verifiable model system. GRAM pushes that recursive latent-space reasoning further into probabilistic generation and multi-trajectory sampling.

If HRM first proposed and validated the "model runs hierarchical recursive reasoning before output" skeleton, GRAM layers a generative probabilistic wrapper on top of that line.

That's why HRM-Text deserves a more important seat at the table. It's not an isolated small model. It's a signal of where next-generation reasoning architecture is turning.

$1,500 Doesn't Just Break the Training-Cost Number

$1,500 is obviously not the end of the story. It doesn't mean foundation-model R&D has suddenly become easy.

HRM-Text is still a Proof of Concept. It isn't a mature chat model, and it hasn't gone through full post-training, RLHF, or large-scale productization. Its knowledge coverage, performance on truly open tasks, long-context ability, tool use, and ability to scale all still need to be tested.

But the real sting of this number is that it brings another possibility back into foundation-model R&D.

For the past few years, foundation models have looked more and more like heavy industry. Bigger GPU clusters, longer training cycles, more complex data engineering. It's easy for the industry to slip into a default:

Only the giants can explore foundation models. Only huge compute can validate a new architecture. Scaling is the only correct path.

HRM-Text's appearance doesn't deny that scaling works. Scaling is still powerful.

But it reminds the industry: Scaling is not the only doorway in.

If the model architecture itself can improve computational efficiency, if the training objective can be more focused, and if the model can decouple knowledge storage from reasoning ability, then foundation-model innovation doesn't have to be defined by compute scale alone.

For enterprises, the real bottleneck in AI adoption today is not just "the model isn't smart enough." It's that training is expensive, infrastructure is heavy, iteration cycles are slow, and the cost of trial and error is high.

Many enterprises don't need to train a giant general model from scratch. What they actually need is more efficient, more controllable, more customizable reasoning on specific tasks: reading private enterprise knowledge, finding the right information, analyzing complex systems, calling tools, planning, verifying results, and continuously learning on specific tasks.

The hint HRM-Text gives is:

If the model architecture itself can lift computational efficiency, then enterprise AI capability building doesn't have to fully depend on bigger models and heavier infrastructure.

For the research community, the meaning of HRM-Text is that more architectural hypotheses get a chance to be tested.

For the past few years, foundation-model R&D has looked more and more like heavy industry. Bigger GPU clusters, longer training cycles, more complex data engineering — all of this makes it hard for university labs, startup teams, independent researchers, and the open-source community to participate directly in foundation-model-level frontier experiments.

The real worry is not the cost itself. It's that many different technical possibilities might get filtered out before they ever get properly tested.

When a line requires huge resources to validate, the industry naturally trends toward the most certain, most mainstream, most resource-intensive direction. And the more early, more risky, possibly more breakthrough architectural hypotheses get fewer chances at real experiments.

Sapient's meaning is that it didn't wait for the giants to validate the line first. It was the first to turn a different frontier AI path into a sample the industry can actually test.

It doesn't deny the power of scaling. But it shows that foundation-model innovation doesn't have to be defined by compute scale alone.

Architecture, training objective, recursive computation, and open-source verification can also be key forces pushing the frontier of AI.

Seen this way, the value of HRM-Text isn't that it proves small models will replace big models. It reminds the industry:

Frontier AI should not have only one doorway in.

The Next Step for HRM: Not Better at Chatting — Better at Working

Sapient's long-term judgment on HRM can be put in one sentence:

The model doesn't need to remember everything. It needs to learn how to think, how to look things up, how to learn, and how to use information.

This is reasoning–knowledge decoupling.

In the early phase, it can plug in external knowledge like RAG. But further out, HRM's goal isn't just retrieving documents. It's giving the model a stronger reasoning core:

Knowing what to look up.
Knowing where to look.
Knowing how to judge whether information is reliable.
Knowing how to fold new knowledge into the current task.
Knowing how to plan, call tools, verify results.
Knowing how to actually finish a complex task.

That's closer to how humans work. We don't memorize everything in the world. The smart people know the structure of the problem, and they know who to ask, what to check, how to verify, and how to act.

In the future, it can act as a foundational Reasoning Core taking on roles like:

Reliability Diagnostician — diagnose complex system stability, generate root-cause hypotheses, analyze dependencies, blast radius, and rollback plans, and execute safe remediation.
System Optimizer — analyze system behavior, find performance bottlenecks and resource waste, propose or run optimization plans.
Data Organizer — turn messy enterprise knowledge, documents, logs, databases, and workflows into a memory system that is searchable, reason-able, and learnable.
Tool Calling Director — decide when to call which tool, API, model, or data source, plan the call order, verify intermediate results, and finish the task.

That is the difference between HRM and ordinary chat models.

The core question for a chat model is: how do I answer the user?

The core question for HRM is: how do I complete the task?

Seen this way, the commercial value of HRM is more than "training is cheaper." More importantly, it may change the way enterprises build AI capability.

In the past, when enterprises wanted stronger AI, the only path was to plug in a bigger general model, then bolt it onto business workflows with prompt engineering, RAG, tool chains, and agent frameworks.

The problem with that approach is obvious: the system gets more and more complex, the call chain gets longer and longer, costs climb, and results get harder and harder to verify.

HRM imagines a different structure:

The bottom layer is a stronger reasoning core. Knowledge bases, tools, memory, and environmental feedback are attached on the outside. The model doesn't need to remember everything — it needs to know how to organize tasks, how to use information, and how to verify results.

That also means the next step for HRM is not "better at chatting." It's better at working.

From Symbolic to Text, and Then to World Models

The HRM line is not stopping at language.

Sapient started with symbolic reasoning, using Sudoku, mazes, and ARC-AGI to prove that hierarchical recursive reasoning can run.

Then it moved to HRM-Text, bringing the architecture into natural language models.

The next step, naturally, is image, video, audio, robotics, and world models.

Because what HRM is processing isn't a specific data format. It's something more fundamental: state, relations, constraints, plans, actions, feedback.

That's why HRM has omni-modal potential.

Symbols, text, images, video, audio, and robot sensor data can all be turned into internal state spaces for the model. If HRM can learn, across modalities, how to organize state, how to predict change, and how to plan action, it stops being just a language model. It becomes a candidate architecture for world models.

This is exactly what embodied AI needs most. A robot can't just answer. A robot needs to understand its environment, predict consequences, plan actions, and correct itself after failure.

For systems like that, saying a beautiful sentence is meaningless. What matters is: think it through, and then do it right.

So the meaning of HRM-Text is not limited to language models. It is more like a stage-validation of Sapient pushing HRM from symbolic reasoning into open language environments. If the line keeps holding, the next step for HRM isn't just text — it may be world modeling in the broader sense: understanding how state changes, how actions produce consequences, how plans get executed, and how failures get corrected.

That's also why HRM's imagination space should not be trapped under the "small model" label.

What really matters is that it tries to give an intelligent system a stronger internal computational structure.

Lean General Intelligence: AI's Future Should Not Have Only One Path

Zooming out, what stands behind HRM is Sapient's long-term view of general intelligence:

The exploration of advanced AI should not be a single path endlessly reinforced by resource scale. It should be a technical process advanced by more researchers, more developers, more startup teams, and the open-source community.

Sapient frames its long-term direction as: Lean General Intelligence.

Here, "Lean" is not "small" and not "cheap." It means more efficient, more accessible, and more focused on the computational structure itself.

The industry has already proven the power of scaling over the past few years. But now another question is becoming more and more important:

When training costs keep climbing, token bills keep growing, agents get more and more complex, and enterprises increasingly need controllable, verifiable, customizable intelligence — is continuing to scale the model the only answer?

HRM gives another answer.

Not "let the model memorize more knowledge," but "give the model a stronger reasoning core." Not "let the model output longer CoT," but "let the model complete deeper computation in latent space." Not "stuff every capability into one black-box large model," but "reorganize reasoning, knowledge, tools, memory, and action into one system."

This is the most important meaning of HRM-Text.

It doesn't prove the 1B model has won. It proves that the architecture of AI is far from settled.

If the past few years' main thread was Scaling, the next round of reasoning models may face a new question:

Should the model be bigger — or should it be better at thinking?

Sapient's answer is HRM.

And HRM-Text is the first public sample of that line, in the foundation language-model context. It's early. But it's important.

Because it reminds the whole industry:

AI's future should not have only one path.

Bigger models will keep mattering. But models that are better at thinking may be the real doorway into the next round of reasoning architecture.

From HRM-Symbolic to HRM-Text, and then to GRAM's heavy reuse of the HRM skeleton with Bengio as a co-author, hierarchical recursive reasoning is no longer just Sapient's internal line. It is becoming an important direction for next-generation reasoning models.

This is also Sapient's meaning in this story:

It didn't follow the answers the industry had already given. It put a new answer on the table first — runnable, open-source, verifiable.

If the past few years have fully proven the power of scaling, Sapient is reminding the industry:

AI's future should not have only one path.

And Sapient Intelligence is one of the earliest players to put a complete answer on this new road.

Paper: arxiv.org/abs/2605.20613GitHub: github.com/sapientinc/HRM-TextModel: huggingface.co/sapientinc/HRM-Text-1B