nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 Image-Text-to-Text • 13B • Updated 29 days ago • 104k • 68
view article Article LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR Oct 23, 2025 • 62
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation Paper • 2510.00515 • Published Oct 1, 2025 • 39
jinaai/jina-embeddings-v4-vllm-retrieval Visual Document Retrieval • 4B • Updated Sep 17, 2025 • 11.8k • 32
jina-embeddings-v4 Collection Universal Embeddings for Multimodal Multilingual Retrieval • 10 items • Updated Sep 2, 2025 • 2
Lost in Embeddings: Information Loss in Vision-Language Models Paper • 2509.11986 • Published Sep 15, 2025 • 28
Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation Paper • 2509.10058 • Published Sep 12, 2025 • 11
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer Paper • 2508.10893 • Published Aug 14, 2025 • 31
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published Aug 11, 2025 • 44
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off Paper • 2508.04825 • Published Aug 6, 2025 • 58
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Paper • 2506.15681 • Published Jun 18, 2025 • 39
sentence-transformers/all-MiniLM-L6-v2 Sentence Similarity • 22.7M • Updated Mar 6, 2025 • 145M • • 4.29k