Papers
arxiv:2510.14901

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Published on Oct 16
· Submitted by Zhi Zhou on Oct 27
Authors:
,

Abstract

An iterative sampling algorithm enhances reasoning capabilities in base models without additional training, matching or outperforming reinforcement learning on single-shot tasks.

AI-generated summary

Frontier reasoning models have exhibited incredible capabilities across a wide array of disciplines, driven by posttraining large language models (LLMs) with reinforcement learning (RL). However, despite the widespread success of this paradigm, much of the literature has been devoted to disentangling truly novel behaviors that emerge during RL but are not present in the base models. In our work, we approach this question from a different angle, instead asking whether comparable reasoning capabilites can be elicited from base models at inference time by pure sampling, without any additional training. Inspired by Markov chain Monte Carlo (MCMC) techniques for sampling from sharpened distributions, we propose a simple iterative sampling algorithm leveraging the base models' own likelihoods. Over different base models, we show that our algorithm offers substantial boosts in reasoning that nearly match and even outperform those from RL on a wide variety of single-shot tasks, including MATH500, HumanEval, and GPQA. Moreover, our sampler avoids the collapse in diversity over multiple samples that is characteristic of RL-posttraining. Crucially, our method does not require training, curated datasets, or a verifier, suggesting broad applicability beyond easily verifiable domains.

Community

Paper submitter

This paper explores a sampling-based method for LLM reasoning to improve reasoning performance, surpassing the results of RL training using GPRO with a very low cost.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

·

🥺 Please also see our NeurIPS 2025 paper "A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning", which introduces a theoretical framework for sampling-based test-time scaling methods.

Interesting work!
It seems that the base model can achieve higher performance even without extra training process, i have a question, have you tried this method on other VLM tasks, such as Grounding or Video understanding?

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.14901 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.14901 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.14901 in a Space README.md to link it from this page.

Collections including this paper 4