Qwen/Qwen3-VL-235B-A22B-Thinking Image-Text-to-Text • 236B • Updated about 1 month ago • 25.3k • • 312
Perception Encoder: The best visual embeddings are not at the output of the network Paper • 2504.13181 • Published Apr 17 • 34
docling-project/SmolDocling-256M-preview Image-Text-to-Text • 0.3B • Updated Sep 17 • 348k • 1.59k