Sphere Fall 2022

classroom

AI & ML interests

None defined yet.

Recent Activity

lvwerra authored a paper 21 days ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

lvwerra authored a paper 4 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

lvwerra authored a paper 7 months ago

SmolVLM: Redefining small and efficient multimodal models

View all activity

jlopez-dl

posted an update 8 days ago

Post

270

My deep dive in DeepSeek V3.2-exp DSA: https://disruptedai.substack.com/p/deepseek-v32-exp-deep-dive

Would love to know your opinion. Thanks!

lvwerra

authored a paper 21 days ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published 26 days ago • 34

jlopez-dl

posted an update 21 days ago

Post

1754

Just posted https://huggingface.co/blog/jlopez-dl/hybrid-attention-game-changer, for those interested in Hybrid Attention

lvwerra

authored a paper 4 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75

lvwerra

authored a paper 7 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200

lvwerra

authored a paper 9 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 246

lvwerra

authored a paper 10 months ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 63

lvwerra

authored a paper about 1 year ago

SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published Oct 31, 2024 • 24

lvwerra

authored 3 papers over 1 year ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 97

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Paper • 2405.18392 • Published May 28, 2024 • 12

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 149

lvwerra

authored 2 papers about 2 years ago

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 122

OctoPack: Instruction Tuning Code Large Language Models

Paper • 2308.07124 • Published Aug 14, 2023 • 31

lvwerra

authored 2 papers over 2 years ago

The Stack: 3 TB of permissively licensed source code

Paper • 2211.15533 • Published Nov 20, 2022 • 6

Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

Paper • 2210.01970 • Published Sep 30, 2022 • 13

Rindra

updated a model about 3 years ago

Sphere-Fall2022/mmi-test-bert-glue

Text Classification • Updated Sep 29, 2022 • 4

bvint

updated a model about 3 years ago

Sphere-Fall2022/bvint-test-bert-glue

Updated Sep 21, 2022

NimaBoscarino

updated a model about 3 years ago

Sphere-Fall2022/nima-test-bert-glue

Text Classification • Updated Sep 21, 2022 • 2 • 1