arxiv:2510.14901

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Published on Oct 16

· Submitted by

Zhi Zhou on Oct 27

Harvard University

Upvote

Authors:

Abstract

An iterative sampling algorithm enhances reasoning capabilities in base models without additional training, matching or outperforming reinforcement learning on single-shot tasks.

AI-generated summary

Frontier reasoning models have exhibited incredible capabilities across a wide array of disciplines, driven by posttraining large language models (LLMs) with reinforcement learning (RL). However, despite the widespread success of this paradigm, much of the literature has been devoted to disentangling truly novel behaviors that emerge during RL but are not present in the base models. In our work, we approach this question from a different angle, instead asking whether comparable reasoning capabilites can be elicited from base models at inference time by pure sampling, without any additional training. Inspired by Markov chain Monte Carlo (MCMC) techniques for sampling from sharpened distributions, we propose a simple iterative sampling algorithm leveraging the base models' own likelihoods. Over different base models, we show that our algorithm offers substantial boosts in reasoning that nearly match and even outperform those from RL on a wide variety of single-shot tasks, including MATH500, HumanEval, and GPQA. Moreover, our sampler avoids the collapse in diversity over multiple samples that is characteristic of RL-posttraining. Crucially, our method does not require training, curated datasets, or a verifier, suggesting broad applicability beyond easily verifiable domains.

View arXiv page View PDF Project page GitHub 223 Add to collection

Community

WNJXYK

Paper submitter about 22 hours ago

This paper explores a sampling-based method for LLM reasoning to improve reasoning performance, surpassing the results of RL training using GPRO with a very low cost.

librarian-bot

about 22 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

WNJXYK

about 22 hours ago

🥺 Please also see our NeurIPS 2025 paper "A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning", which introduces a theoretical framework for sampling-based test-time scaling methods.

CleyChen

about 21 hours ago

Interesting work!
It seems that the base model can achieve higher performance even without extra training process, i have a question, have you tried this method on other VLM tasks, such as Grounding or Video understanding?