Foley Control: Aligning a Frozen Latent Text-to-Audio Model to Video Paper • 2510.21581 • Published 4 days ago • 2
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs Paper • 2510.13251 • Published 13 days ago • 10
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning Paper • 2510.20286 • Published 5 days ago • 18
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation Paper • 2510.21583 • Published 4 days ago • 29
PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments Paper • 2510.21111 • Published 4 days ago • 1
Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers Paper • 2510.11370 • Published 15 days ago • 2
Taming Modality Entanglement in Continual Audio-Visual Segmentation Paper • 2510.17234 • Published 8 days ago • 3
PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis Paper • 2510.21447 • Published 4 days ago • 3
Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost Paper • 2510.20780 • Published 5 days ago • 3
AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite Paper • 2510.21652 • Published 4 days ago • 3
ARC-Encoder: learning compressed text representations for large language models Paper • 2510.20535 • Published 5 days ago • 5
Document Understanding, Measurement, and Manipulation Using Category Theory Paper • 2510.21553 • Published 4 days ago • 4
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging Paper • 2510.20479 • Published 5 days ago • 11
Reasoning with Sampling: Your Base Model is Smarter Than You Think Paper • 2510.14901 • Published 12 days ago • 30
DeepAgent: A General Reasoning Agent with Scalable Toolsets Paper • 2510.21618 • Published 4 days ago • 68