SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper โข 2506.01844 โข Published Jun 2 โข 140
SmolVLM: Redefining small and efficient multimodal models Paper โข 2504.05299 โข Published Apr 7 โข 200
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper โข 2503.11576 โข Published Mar 14 โข 117
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper โข 2502.02737 โข Published Feb 4 โข 243
GACELA -- A generative adversarial context encoder for long audio inpainting Paper โข 2005.05032 โข Published May 11, 2020
Adversarial Generation of Time-Frequency Features with application in audio synthesis Paper โข 1902.04072 โข Published Feb 11, 2019
Building and better understanding vision-language models: insights and future directions Paper โข 2408.12637 โข Published Aug 22, 2024 โข 133