Collections
Discover the best community collections!
Collections including paper arxiv:2508.18756
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 106 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 150 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 249 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Ultra-Sparse Memory Network
Paper • 2411.12364 • Published • 23 -
Hyper-Connections
Paper • 2409.19606 • Published • 24 -
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Paper • 2411.03884 • Published • 28 -
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper • 2501.16975 • Published • 31
-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 3.19k • 48 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 126
-
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers
Paper • 2506.14702 • Published • 3 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 267 -
Scaling Test-time Compute for LLM Agents
Paper • 2506.12928 • Published • 63 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 92
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
Snowflake/Arctic-Text2SQL-R1-7B
8B • Updated • 3.19k • 48 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 274 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Paper • 2506.16406 • Published • 126
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 106 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 150 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
-
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers
Paper • 2506.14702 • Published • 3 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 267 -
Scaling Test-time Compute for LLM Agents
Paper • 2506.12928 • Published • 63 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 92
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 249 • 96 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
Ultra-Sparse Memory Network
Paper • 2411.12364 • Published • 23 -
Hyper-Connections
Paper • 2409.19606 • Published • 24 -
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Paper • 2411.03884 • Published • 28 -
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Paper • 2501.16975 • Published • 31