97 116 249

Andres Marafioti

andito

AI & ML interests

Multimodal models, VLM and TTS

Recent Activity

updated a Space 3 days ago

HuggingFaceM4/reachy_mini_remote_control

liked a Space 3 days ago

cduss/reachy_mini_simple_control_panel

liked a Space 4 days ago

webml-community/Supertonic-TTS-WebGPU

View all activity

Organizations

published an article about 1 month ago

Article

Streaming datasets: 100x More Efficient

Oct 27

•

published an article about 1 month ago

Article

Supercharge your OCR Pipelines with Open Models

Oct 21

•

266

published an article 4 months ago

Article

TimeScope: How Long Can Your Video Large Multimodal Model Go?

Jul 23

•

published an article 5 months ago

Article

Efficient MultiModal Data Pipeline

Jul 8

•

published an article 6 months ago

Article

KV Cache from scratch in nanoVLM

Jun 4

•

102

published an article 6 months ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Jun 3

•

287

published an article 6 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

May 21

•

232

published an article 7 months ago

Article

Vision Language Models (Better, faster, stronger)

May 12

•

567

published an article 9 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

Feb 20

•

315

published an article 10 months ago

Article

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

Jan 23

•

186

published an article about 1 year ago

Article

SmolVLM - small yet mighty Vision Language Model

Nov 26, 2024

•

386

published an article about 1 year ago

Article

Deploying Speech-to-Speech on Hugging Face

Oct 22, 2024

•

published an article about 1 year ago

Article

FineVideo: behind the scenes

Sep 23, 2024

•

published an article over 1 year ago

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Jul 25, 2024

•

published an article over 1 year ago

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18, 2024

•

published an article over 1 year ago

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

Jun 24, 2024

•

202

Andres Marafioti

AI & ML interests

Recent Activity

Organizations

andito's activity

Streaming datasets: 100x More Efficient

Supercharge your OCR Pipelines with Open Models

TimeScope: How Long Can Your Video Large Multimodal Model Go?

Efficient MultiModal Data Pipeline

KV Cache from scratch in nanoVLM

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Vision Language Models (Better, faster, stronger)

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

SmolVLM - small yet mighty Vision Language Model

Deploying Speech-to-Speech on Hugging Face

FineVideo: behind the scenes

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Docmatix - a huge dataset for Document Visual Question Answering

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models