Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training
			Paper
			•
			2510.04996
			•
			Published
				
			•
				
				15
			
Training & test sets and finetuned models
Note Prompt set used for data processing
Note Selected hard prompts used to train Qwen2.5-Math-7B and Qwen3-4B-Instruct-2507
Note Selected easy prompts used to train Qwen2.5-Math-7B
Note Selected hard prompts used to train Llama-3.2-3B-Instruct
 
				Note Checkpoint from step=400 and trained on the hard prompt set
 
				Note Checkpoint from step=400 and trained on the hard prompt set.
 
				Note Checkpoint from step=400 and trained on the hard prompt set.
 
				Note Checkpoint from step=500 and trained on the easy prompt set.
Note Selected easy prompts used to train Qwen2.5-Math-1.5B
Note Selected hard prompts used to train Qwen2.5-Math-1.5B
 
				 
				 
				