euclaise

https://euclaise.xyz

euclaise

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

MiniMax Sparse Attention

upvoted a paper 1 day ago

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

liked a model 7 days ago

google/diffusiongemma-26B-A4B-it

View all activity

Organizations

upvoted 2 papers 1 day ago

MiniMax Sparse Attention

Paper • 2606.13392 • Published 13 days ago • 145

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Paper • 2602.21103 • Published 22 days ago • 7

upvoted a paper 21 days ago

NITP: Next Implicit Token Prediction for LLM Pre-training

Paper • 2605.24956 • Published May 24 • 35

upvoted a changelog 22 days ago

Hugging Face Changelog

Filter Models page by Base Models only

27 days ago

• 165

upvoted 3 papers about 1 month ago

upvoted 10 papers 3 months ago

Efficient Exploration at Scale

Paper • 2603.17378 • Published Mar 18 • 15

V_{0.5}: Generalist Value Model as a Prior for Sparse RL Rollouts

Paper • 2603.10848 • Published Mar 11 • 16

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Paper • 2603.09229 • Published Mar 10 • 84

On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published Aug 7, 2025 • 190

Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 189

ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates

Paper • 2502.06772 • Published Feb 10, 2025 • 22

RAT: Bridging RNN Efficiency and Attention Accuracy in Language Modeling

Paper • 2507.04416 • Published Jul 6, 2025 • 1

RAT+: Train Dense, Infer Sparse -- Recurrence Augmented Attention for Dilated Inference

Paper • 2602.18196 • Published Feb 20 • 1

How Far Can Unsupervised RLVR Scale LLM Training?

Paper • 2603.08660 • Published Mar 9 • 60

Lost in Backpropagation: The LM Head is a Gradient Bottleneck

Paper • 2603.10145 • Published Mar 10 • 13

upvoted 3 papers 4 months ago

Online Vector Quantized Attention

Paper • 2602.03922 • Published Feb 3 • 1

Softmax Linear Attention: Reclaiming Global Competition

Paper • 2602.01744 • Published Feb 2 • 1

Test-Time Training with KV Binding Is Secretly Linear Attention

Paper • 2602.21204 • Published Feb 24 • 32

euclaise

AI & ML interests

Recent Activity

Organizations

euclaise's activity

Filter Models page by Base Models only