Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.18756

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26 • 36

Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2 • 106
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 150
Autoregressive Diffusion Models

Paper • 2110.02037 • Published Oct 5, 2021
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13 • 8

about 11 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 249 • 96
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Full Paper List

Ultra-Sparse Memory Network

Paper • 2411.12364 • Published Nov 19, 2024 • 23
Hyper-Connections

Paper • 2409.19606 • Published Sep 29, 2024 • 24
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Paper • 2411.03884 • Published Nov 6, 2024 • 28
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Paper • 2501.16975 • Published Jan 28 • 31

Snowflake/Arctic-Text2SQL-R1-7B

8B • Updated May 29 • 3.19k • 48
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 274
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126

Large Language Models

Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers

Paper • 2506.14702 • Published Jun 17 • 3
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 267
Scaling Test-time Compute for LLM Agents

Paper • 2506.12928 • Published Jun 15 • 63
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8 • 92

ByteDance Papers

ByteDance papers collection

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26 • 36

Snowflake/Arctic-Text2SQL-R1-7B

8B • Updated May 29 • 3.19k • 48
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 274
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126

Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2 • 106
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 150
Autoregressive Diffusion Models

Paper • 2110.02037 • Published Oct 5, 2021
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13 • 8

Large Language Models

Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers

Paper • 2506.14702 • Published Jun 17 • 3
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 267
Scaling Test-time Compute for LLM Agents

Paper • 2506.12928 • Published Jun 15 • 63
A Survey on Latent Reasoning

Paper • 2507.06203 • Published Jul 8 • 92

about 11 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 249 • 96
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

ByteDance Papers

ByteDance papers collection

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

Full Paper List

Ultra-Sparse Memory Network

Paper • 2411.12364 • Published Nov 19, 2024 • 23
Hyper-Connections

Paper • 2409.19606 • Published Sep 29, 2024 • 24
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Paper • 2411.03884 • Published Nov 6, 2024 • 28
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Paper • 2501.16975 • Published Jan 28 • 31

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs