models jinaai/ReaderLM-v2 Text Generation • 2B • Updated Mar 4, 2025 • 3.8k • • 753 m-a-p/YuE-s1-7B-anneal-en-cot Text Generation • 6B • Updated Mar 12, 2025 • 6.7k • 437 starvector/starvector-1b-im2svg Text Generation • 1B • Updated Mar 19, 2025 • 2.34k • 181 stepfun-ai/Step1X-Edit Image-to-Image • Updated Jul 9, 2025 • 85 • 326
papers DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 138 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 123 Running 3.63k The Ultra-Scale Playbook 🌌 3.63k The ultimate guide to training LLM on large GPU Clusters Running 250 LLM训练终极指南 | The Ultra-Scale Playbook 🔥 250 了解LLM训练的方方面面
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 138
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 123
Running 3.63k The Ultra-Scale Playbook 🌌 3.63k The ultimate guide to training LLM on large GPU Clusters
models jinaai/ReaderLM-v2 Text Generation • 2B • Updated Mar 4, 2025 • 3.8k • • 753 m-a-p/YuE-s1-7B-anneal-en-cot Text Generation • 6B • Updated Mar 12, 2025 • 6.7k • 437 starvector/starvector-1b-im2svg Text Generation • 1B • Updated Mar 19, 2025 • 2.34k • 181 stepfun-ai/Step1X-Edit Image-to-Image • Updated Jul 9, 2025 • 85 • 326
papers DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 138 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 123 Running 3.63k The Ultra-Scale Playbook 🌌 3.63k The ultimate guide to training LLM on large GPU Clusters Running 250 LLM训练终极指南 | The Ultra-Scale Playbook 🔥 250 了解LLM训练的方方面面
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 138
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 123
Running 3.63k The Ultra-Scale Playbook 🌌 3.63k The ultimate guide to training LLM on large GPU Clusters