Language - a neutrino12 Collection

neutrino12 's Collections

Datasets & Evals

Personal Interests

Agent

Vision

Language

updated Oct 1

Snowflake/Arctic-Text2SQL-R1-7B

8B • Updated May 29 • 4.52k • 51
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 274
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 131
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 308
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22 • 120
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 306
nablaNABLA: Neighborhood Adaptive Block-Level Attention

Paper • 2507.13546 • Published Jul 17 • 123
GR-3 Technical Report

Paper • 2507.15493 • Published Jul 21 • 47
MUR: Momentum Uncertainty guided Reasoning for Large Language Models

Paper • 2507.14958 • Published Jul 20 • 46
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Paper • 2507.15758 • Published Jul 21 • 35
Complex Logical Instruction Generation

Paper • 2508.09125 • Published Aug 12 • 39
SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Paper • 2508.05305 • Published Aug 7 • 46
PRELUDE: A Benchmark Designed to Require Global Comprehension and Reasoning over Long Contexts

Paper • 2508.09848 • Published Aug 13 • 67
Train Long, Think Short: Curriculum Learning for Efficient Reasoning

Paper • 2508.08940 • Published Aug 12 • 26
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment

Paper • 2508.07750 • Published Aug 11 • 19
Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal

Paper • 2508.05988 • Published Aug 8 • 19
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Paper • 2508.07101 • Published Aug 9 • 13
Compressing Chain-of-Thought in LLMs via Step Entropy

Paper • 2508.03346 • Published Aug 5 • 7
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Paper • 2508.09726 • Published Aug 13 • 14
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2 • 236
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7 • 178
R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published Aug 7 • 126
Story2Board: A Training-Free Approach for Expressive Storyboard Generation

Paper • 2508.09983 • Published Aug 13 • 68
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models

Paper • 2508.02120 • Published Aug 4 • 19
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following

Paper • 2508.02150 • Published Aug 4 • 36
Trainable Dynamic Mask Sparse Attention

Paper • 2508.02124 • Published Aug 4 • 17
Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability

Paper • 2508.04017 • Published Aug 6 • 11
Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21 • 87
PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing

Paper • 2508.11116 • Published Aug 14 • 22
Efficient Code Embeddings from Code Generation Models

Paper • 2508.21290 • Published Aug 29 • 18
Model-Task Alignment Drives Distinct RL Outcomes

Paper • 2508.21188 • Published Aug 28 • 8
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28 • 89
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26 • 36
Hermes 4 Technical Report

Paper • 2508.18255 • Published Aug 25 • 39
StepWiser: Stepwise Generative Judges for Wiser Reasoning

Paper • 2508.19229 • Published Aug 26 • 20
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published Aug 26 • 15
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Paper • 2508.18076 • Published Aug 25 • 6
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

Paper • 2508.16072 • Published Aug 22 • 4
Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills

Paper • 2508.19500 • Published Aug 27 • 2
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning

Paper • 2508.15868 • Published Aug 21 • 3
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts

Paper • 2508.10390 • Published Aug 14 • 1
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published Aug 31 • 83
DCPO: Dynamic Clipping Policy Optimization

Paper • 2509.02333 • Published Sep 2 • 21
When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance

Paper • 2509.22193 • Published Sep 26 • 37
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published Sep 23 • 22