SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention Paper • 2509.24006 • Published 28 days ago • 114
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs Paper • 2509.22220 • Published about 1 month ago • 64
Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR Paper • 2509.18174 • Published Sep 17 • 124
view article Article Introducing smolagents: simple agents that write actions in code. Dec 31, 2024 • 1.14k
Running on CPU Upgrade 13.6k 13.6k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory Paper • 2508.09736 • Published Aug 13 • 56
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper • 2506.07044 • Published Jun 8 • 113
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers Paper • 2506.07986 • Published Jun 9 • 19