Papers
arxiv:2510.04871

Less is More: Recursive Reasoning with Tiny Networks

Published on Oct 6
Β· Submitted by Alexia Jolicoeur-Martineau on Oct 8
#1 Paper of the day

Abstract

Tiny Recursive Model (TRM) achieves high generalization on complex puzzle tasks using a small, two-layer network with minimal parameters, outperforming larger language models.

AI-generated summary

Hierarchical Reasoning Model (HRM) is a novel approach using two small neural networks recursing at different frequencies. This biologically inspired method beats Large Language models (LLMs) on hard puzzle tasks such as Sudoku, Maze, and ARC-AGI while trained with small models (27M parameters) on small data (around 1000 examples). HRM holds great promise for solving hard problems with small networks, but it is not yet well understood and may be suboptimal. We propose Tiny Recursive Model (TRM), a much simpler recursive reasoning approach that achieves significantly higher generalization than HRM, while using a single tiny network with only 2 layers. With only 7M parameters, TRM obtains 45% test-accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, higher than most LLMs (e.g., Deepseek R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of the parameters.

Community

Paper author Paper submitter

Less is More

Have any checkpoints been made available by the authors or anyone else?

Β·

not yet from what I've seen. cc @AlexiaJM

This is awesome!!!

Any chance you can link the repository with the code and dataset?

Β·

it seems to perform very well on task tuning, eg. the sudoku... even if its not agi or anything like that, this could be revolutionary for small, domain specific tasks

Β·

That's what I'm thinking. I'm going to test it out with some security-specific tasks.

How do normal supervised learning methods perform on these type of tasks. I am confused what differentiates this vs a larger forward pass. Or is simply the learning efficiency?

Β·

This sentence in the Conclusion implies this architecture is superior to a massive net without recursive reasoning:

the question of why recursion helps so much compared to using a larger and deeper network remains to be explained; we suspect it has to do with overfitting, but we have no theory to back this explaination (sic)

I don't understand this new trend of comparing models like HRM and TRM with LLMs?

How is that relevant? They're not LLMs. The term "reasoning" has nothing to with reasoning in LLMs. I don't even think these techniques are applicable to LLMs, are they?

Like, of course a specialized model, trained for a specific task, is going to perform better than an LLM trained for an entirely different class of problems, right?

For that matter, how is it relevant to test these models on ARC-AGI, which is a benchmark to evaluate the problem solving capabilities of LLMs?

It's apples to oranges, isn't it? Jet airplanes to weather balloons? The weather balloon is obviously way better at monitoring the weather, but jet airplanes have quite a few more uses.

Compare these models against other specialized models: are they significantly smaller or faster?

Give us data we can actually compare.

I honestly have no idea if there's anything truly novel about these models, because you haven't provided any relevant comparison to anything remotely similar. πŸ€·β€β™‚οΈ

Β·

ARC-AGI are benchmarks to evaluate the fluid intelligence of any AI - not only LLMs

The fact that TRMs perform better than any known architecture on those benchmarks is interesting on its own.

I guess no one grasp the larger truth. This style of recursive style of reasoning is in reality belief-state engineering. This actualizes this on an architectural basis-- although you can achieve something similar with an extra encoder in normal decoder-only LM's during training. Cool paper. Hope to see more like this, and hope people extrapolate belief-state engineering to other research facets, one day it will replace RL.

Β·

Thanks, I'd never heard of belief-state engineering.
I found this definition from:
Hu, E.S., Ahn, K., Liu, Q., Xu, H., Tomar, M., Langford, A., Jayaraman, D., Lamb, A. and Langford, J., 2024. Learning to achieve goals with belief state transformers. arXiv e-prints, pp.arXiv-2410. https://doi.org/10.48550/arXiv.2410.23506

Informally, a belief state is a sufficient amount of information from the past to predict
the outcomes of all experiments in the future, which can be expressed as either a distribution over
underlying world states or a distribution over future outcomes.

TRM's comparison to LLMs conflates two separate issues. First, the training regime: TRM uses 1000x data augmentation per example, about 1M effective samples, while Gemini and DeepSeek are zero-shot on ARC-AGI. A Transformer trained the same way would perform similarly. This isn't an architecture advantage, it's a data advantage. Second, the task structure mismatch is fundamental. ARC-AGI has deterministic solutions in fixed-dimensional spaces. LLMs generate variable-length token sequences with discrete vocabulary constraints. Recursive latent refinement and autoregressive token generation operate in completely different optimization spaces. The recursive approach is conceptually interesting for LLMs (Meta's Coconut explores this), but you can't benchmark it fairly on ARC-AGI. A proper test would integrate TRM's latent recursion into an LLM's hidden states and measure actual language task performance, not geometric puzzle accuracy.

Β·

That completely changes the perception here. Data augmentation using domain rules for example, in the case of Sudoku would basically result in an infinite supply of training data. Such level of augmentation is impossible for most practical datasets from real world.

Very interesting work and the first question popped in my mind when I see recursive is speed. How fast does each question get processed compared to the other models compared in this paper?

This on a tiny, extremely efficient node will make home AI systems possible for everyone. No need for a huge server rack in the basement.

Hi, I love the work @AlexiaJM . Have you considered adding register tokens? I think register tokens might give some improvement because it alleviates attention noise and outliers. Since it's recursive depth, I think alleviating attention noise and outliers might lead to more stability. Wondering if this has been tested or tried before.

Here is a bite sized podcast I created with AI on the paper! https://open.spotify.com/episode/6OIKWXIFjw1a2PHSBR4Fm6?si=fb252fb2000d45e5

Β·

Great, AI discussing AI.

Paper author Paper submitter

Official ARC-Prize verification checkpoints: https://huggingface.co/arcprize/trm_arc_prize_verification

After how many epochs of training, you should start see improvement in the winning rate of finishing the maze?

This might open a new direction for small model

Came here exactly to review my own perspective on the paper: initially I was excited, given such a tiny, tiny model could compete with these huge LLMs. I read it in an AI newsletter, that of course went with the catchphrase of "tiny model beats huge LLM". It appears recursion is the key to achieving great "reasoning" performance: after all, that's what is claimed in the paper. Upon further inspection, I started doubting this claim.

A major argument claimed by the paper is "less is more", and it seems the recursion is claimed to be a key part of novelty. But I see a lack of evidence recursion is causing the greater-than-expected performance given the task. Basically, the only clear argument I see in it: "Model trained extensively and very narrowly on heavy augmented single task beats all-purpose generalist model". It's an indication that specialist models might outperform generalist LLMs on specific tasks such as ACR-AGI-1, but we should be careful concluding the cause for this.

Moreover, a major shortcoming the paper tends to rely heavily on data augmentation and there is no clear comparison between a baseline without recursion, so it unclear whether the great performance stems form architectural improvements or just from a well setup augmentation routine. Quite differing conclusions to the same observation.

I would agree, also, that the term "reasoning" is somewhat confusing. It's as much reasoning as there is in any neural network. It's not reasoning in the NLP sense of the word.

Don't get me wrong, I find it an interesting paper. However, I don't see enough evidence to verify, myself, that we are looking at a generalizable paradigm change. I do find it an interesting technique adaptation due to the recursive architecture, but what it is claiming is far from proven. Next step, it seems, would be to verify whether the good performance is primarily a result of the novel recursion strategy. If not, all that is proven with this paper is that small networks and data augmentation can yield surprisingly good results. Useful, but not the consensus that it seems to convey.

Oh, wonderful! Another day, another research paper claiming to have "outperformed DeepSeek" and "beaten Gemini." The academic achievement board is getting crowded! Let's be real for a second. All these victories seem to exist only on paper. I look around the real world and ask: Where are the applications? Where are the models that millions are actually using? The paper says their tiny model surpassed DeepSeek R1 and Gemini 2.5 Pro! But on what, may I ask? On specialized puzzle tasks like Sudoku and mazes! My dear companies, your triumphs are, with all due respect, currently confined to PDFs! We need models that help people code, write, and understand the world. We don't need a Sudoku-solving master trapped in a research lab. Let's be clear: to this day, I have yet to see a company truly surpass what DeepSeek has delivered in terms of real-world, accessible utility. Its impact in coding, reasoning, and translation is something people use for free, every single day. Please, spare us the theoretical breakthroughs. Show us a real model we can touch, use, and be amazed by. Until then, I'm afraid it's all just... very intelligent ink on very digital paper! πŸ˜„

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.04871 in a Space README.md to link it from this page.

Collections including this paper 74