Running on CPU Upgrade 84 daVinci-MagiHuman 🎬 84 Generate a short video from an image and text prompt
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published 6 days ago • 115
Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs Paper • 2603.16932 • Published 15 days ago • 84
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published 4 days ago • 57
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Paper • 2603.23497 • Published 4 days ago • 84
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published 5 days ago • 127
kotoba-tech/kotoba-whisper-v2.2 Automatic Speech Recognition • 0.8B • Updated Oct 23, 2024 • 14.9k • 97
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published 16 days ago • 90
NLE: Non-autoregressive LLM-based ASR by Transcript Editing Paper • 2603.08397 • Published 20 days ago • 21
AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery Paper • 2603.07300 • Published 21 days ago • 17