Adrian Rodriguez

axr2718

https://axr2718.github.io

AI & ML interests

Neural network pruning, double descent, superposition, topological data analysis, few-shot learning, computer vision, and image segmentation.

Recent Activity

upvoted an article about 2 months ago

Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face

upvoted a collection 5 months ago

SmolLM3 pretraining datasets

upvoted an article 5 months ago

Introduction to State Space Models (SSM)

View all activity

Organizations

None yet

upvoted an article about 2 months ago

Article

Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face

Jul 29

•

205

upvoted a collection 5 months ago

SmolLM3 pretraining datasets

Collection

datasets used in SmolLM3 pretraining • 15 items • Updated Aug 12 • 41

upvoted an article 5 months ago

Article

Introduction to State Space Models (SSM)

Jul 19, 2024

•

197

commented on SmolLM3: smol, multilingual, long-context reasoner 5 months ago

And for pre-training phase 3, only the prompt + model response were concatenated, or were a simply chat template added to them? Thank you!

commented on SmolLM3: smol, multilingual, long-context reasoner 5 months ago

During pre-training phase 3 (reasoning datasets), what was used as the input? Were the user prompt + model response concatenated together and used as the input? Was a chat template added to align with ChatML template?

And is this the same for mid-training and SFT?

commented on SmolLM3: smol, multilingual, long-context reasoner 6 months ago

Have the synthetic datasets created by Qwen3-32B been released or posted anywhere? I see the other datasets in the collections, and some reasoning datasets, but none of the synthetic datasets made from Qwen3-32B. Will the team plan on releasing them?

upvoted a collection 6 months ago