TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments Paper • 2510.01179 • Published Oct 1 • 24
VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL Paper • 2505.23977 • Published May 29 • 10
TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning Paper • 2505.14625 • Published May 20 • 13
dev-store/grpo_gsm8k-blurchain-v1_grpo_gsm8k-blurchain-v1_blur-1.5b_202504100040_step116_202504100844 2B • Updated Apr 10
dev-store/grpo_gsm8k-blurchain-v1_grpo_gsm8k-blurchain-v1_blur-1.5b_202504100040_step116_202504100844 2B • Updated Apr 10