66 21 30

Miquel Farré

mfarre

AI & ML interests

I like everything video

Recent Activity

liked a Space 4 days ago

Qwen/Qwen-Image-Layered

liked a Space 10 days ago

lvwerra/jagged-data-frontier

liked a model 16 days ago

mlx-community/SmolVLM2-500M-Video-Instruct-mlx

View all activity

Organizations

upvoted 3 articles 5 months ago

Article

Welcome GPT OSS, the new open-source model family from OpenAI!

Aug 5

•

508

Article

SmolLM3: smol, multilingual, long-context reasoner

Jul 8

•

739

Article

TimeScope: How Long Can Your Video Large Multimodal Model Go?

Jul 23

•

upvoted 2 papers 6 months ago

Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens

Paper • 2506.17218 • Published Jun 20 • 29

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

Paper • 2502.06145 • Published Feb 10 • 18

upvoted an article 7 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

May 21

•

244

upvoted 2 articles 8 months ago

Article

Vision Language Models (Better, faster, stronger)

May 12

•

573

Article

Cohere on Hugging Face Inference Providers 🔥

Apr 16

•

129

upvoted a paper 9 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 202

upvoted an article 10 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

Feb 20

•

319

upvoted a collection 10 months ago

SmolVLM2 📺 Smallest video LM ever 🤏🏻

Collection

11 items • Updated May 5 • 103

upvoted an article 11 months ago

Article

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

Jan 23

•

187

upvoted an article 12 months ago

Article

Announcing NVIDIA Cosmos World Foundation Models

Jan 7

•

upvoted 2 papers about 1 year ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Paper • 2410.17434 • Published Oct 22, 2024 • 27

upvoted 3 articles over 1 year ago

Article

FineVideo: behind the scenes

Sep 23, 2024

•

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18, 2024

•

Article

Scaling robotics datasets with video encoding

Aug 27, 2024

•

upvoted a paper over 1 year ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 133

Miquel Farré

AI & ML interests

Recent Activity

Organizations

mfarre's activity

Welcome GPT OSS, the new open-source model family from OpenAI!

SmolLM3: smol, multilingual, long-context reasoner

TimeScope: How Long Can Your Video Large Multimodal Model Go?

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Vision Language Models (Better, faster, stronger)

Cohere on Hugging Face Inference Providers 🔥

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM Grows Smaller – Introducing the 256M & 500M Models!

Announcing NVIDIA Cosmos World Foundation Models

FineVideo: behind the scenes

Docmatix - a huge dataset for Document Visual Question Answering

Scaling robotics datasets with video encoding