Papers
arxiv:2502.13124

NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

Published on Feb 18
Authors:
,
,
,
,
,
,
,
,
,

Abstract

NaturalReasoning, a dataset of 2.8 million diverse reasoning questions, enhances knowledge distillation and self-training across various domains, including STEM, Economics, and Social Sciences.

AI-generated summary

Scaling reasoning capabilities beyond traditional domains such as math and coding is hindered by the lack of diverse and high-quality questions. To overcome this limitation, we introduce a scalable approach for generating diverse and challenging reasoning questions, accompanied by reference answers. We present NaturalReasoning, a comprehensive dataset comprising 2.8 million questions that span multiple domains, including STEM fields (e.g., Physics, Computer Science), Economics, Social Sciences, and more. We demonstrate the utility of the questions in NaturalReasoning through knowledge distillation experiments which show that NaturalReasoning can effectively elicit and transfer reasoning capabilities from a strong teacher model. Furthermore, we demonstrate that NaturalReasoning is also effective for unsupervised self-training using external reward models or self-rewarding.

Community

@librarian-bot recommend

Β·

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.13124 in a model README.md to link it from this page.

Datasets citing this paper 7

Browse 7 datasets citing this paper

Spaces citing this paper 9

Collections including this paper 3