Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training Paper • 2510.12586 • Published 14 days ago • 107
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published 19 days ago • 117
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper • 2510.05684 • Published 21 days ago • 133
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper • 2510.12276 • Published 14 days ago • 142
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 15 days ago • 160
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer Paper • 2510.06590 • Published 20 days ago • 70
Cache-to-Cache: Direct Semantic Communication Between Large Language Models Paper • 2510.03215 • Published 25 days ago • 93
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published 19 days ago • 108
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published 22 days ago • 451
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data Paper • 2510.03264 • Published Sep 26 • 23
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization Paper • 2509.23202 • Published Sep 27 • 26
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models Paper • 2509.26388 • Published 28 days ago • 26
CoDA: Agentic Systems for Collaborative Data Visualization Paper • 2510.03194 • Published 25 days ago • 28
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights Paper • 2510.04800 • Published 22 days ago • 36
Imperceptible Jailbreaking against Large Language Models Paper • 2510.05025 • Published 22 days ago • 33
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Paper • 2510.04618 • Published 22 days ago • 109