5 23 11

Marius Dinca

Puddings22

Puddings22

AI & ML interests

None yet

Recent Activity

upvoted a paper about 10 hours ago

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

upvoted a paper 14 days ago

Training Language Models via Neural Cellular Automata

commentedon a paper 22 days ago

SimpleGPT: Improving GPT via A Simple Normalization Strategy

View all activity

Organizations

None yet

upvoted a paper about 10 hours ago

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Paper • 2603.23516 • Published 22 days ago • 20

upvoted a paper 14 days ago

Training Language Models via Neural Cellular Automata

Paper • 2603.10055 • Published 18 days ago • 7

upvoted 8 papers about 1 month ago

Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

Paper • 2602.14080 • Published Feb 15 • 20

On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking

Paper • 2602.16849 • Published Feb 18 • 6

Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

Paper • 2602.11543 • Published Feb 12 • 6

LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation

Paper • 2602.11451 • Published Feb 11 • 15

upvoted 8 papers about 2 months ago

NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models

Paper • 2602.06694 • Published Feb 6 • 15

SimpleGPT: Improving GPT via A Simple Normalization Strategy

Paper • 2602.01212 • Published Feb 1 • 3

FASA: Frequency-aware Sparse Attention

Paper • 2602.03152 • Published Feb 3 • 152

Shaping capabilities with token-level data filtering

Paper • 2601.21571 • Published Jan 29 • 27

Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection

Paper • 2601.19375 • Published Jan 27 • 5

TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors

Paper • 2601.17958 • Published Jan 25 • 3

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Paper • 2601.18778 • Published Jan 26 • 41

Self-Refining Video Sampling

Paper • 2601.18577 • Published Jan 26 • 25

upvoted 2 papers 2 months ago

HeartMuLa: A Family of Open Sourced Music Foundation Models

Paper • 2601.10547 • Published Jan 15 • 48

Lost in the Noise: How Reasoning Models Fail with Contextual Distractors

Paper • 2601.07226 • Published Jan 12 • 33

Marius Dinca

AI & ML interests

Recent Activity

Organizations

Puddings22's activity