VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents Paper • 2601.16973 • Published 3 days ago • 20
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models Paper • 2512.07843 • Published Nov 24, 2025 • 22
NVILA (HuggingFace) Collection HuggingFace Transformers can load us. • 5 items • Updated Sep 13, 2025 • 5
Learning to Grasp Anything by Playing with Random Toys Paper • 2510.12866 • Published Oct 14, 2025 • 6
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13, 2025 • 179
Learning Adaptive Parallel Reasoning with Language Models Paper • 2504.15466 • Published Apr 21, 2025 • 44
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22, 2025 • 63
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Paper • 2504.13169 • Published Apr 17, 2025 • 39
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17, 2025 • 93
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7, 2025 • 205
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 123
An Empirical Study of Autoregressive Pre-training from Videos Paper • 2501.05453 • Published Jan 9, 2025 • 41
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset Paper • 2410.22325 • Published Oct 29, 2024 • 10
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published Oct 21, 2024 • 69