Collections
Discover the best community collections!
Collections including paper arxiv:2503.19786
-
Gemma 3 Technical Report
Paper • 2503.19786 • Published • 54 -
Kimi-VL Technical Report
Paper • 2504.07491 • Published • 132 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 298 -
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Paper • 2504.09925 • Published • 38
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 186 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 36 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 63 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 267
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 10 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 33 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 166
-
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Paper • 2501.02955 • Published • 44 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Paper • 2501.12380 • Published • 85 -
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Paper • 2501.09781 • Published • 28
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 186 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 36 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 63 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 267
-
Gemma 3 Technical Report
Paper • 2503.19786 • Published • 54 -
Kimi-VL Technical Report
Paper • 2504.07491 • Published • 132 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 298 -
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Paper • 2504.09925 • Published • 38
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 10 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 33 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 166
-
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Paper • 2501.02955 • Published • 44 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Paper • 2501.12380 • Published • 85 -
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Paper • 2501.09781 • Published • 28