VLA-R1: Enhancing Reasoning in Vision-Language-Action Models Paper • 2510.01623 • Published 27 days ago • 7
VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction Paper • 2509.19297 • Published Sep 23 • 23
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion Paper • 2506.04648 • Published Jun 5 • 1
StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes Paper • 2509.16415 • Published Sep 19 • 2
ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS Paper • 2505.23734 • Published May 29 • 4
Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting Paper • 2506.05327 • Published Jun 5 • 11
SSS: Semi-Supervised SAM-2 with Efficient Prompting for Medical Imaging Segmentation Paper • 2506.08949 • Published Jun 10
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding Paper • 2507.23478 • Published Jul 31 • 15
PresentAgent: Multimodal Agent for Presentation Video Generation Paper • 2507.04036 • Published Jul 5 • 10
DiffuMural: Restoring Dunhuang Murals with Multi-scale Diffusion Paper • 2504.09513 • Published Apr 13
GAMED-Snake: Gradient-aware Adaptive Momentum Evolution Deep Snake Model for Multi-organ Segmentation Paper • 2501.12844 • Published Jan 22
FDG-Diff: Frequency-Domain-Guided Diffusion Framework for Compressed Hazy Image Restoration Paper • 2501.12832 • Published Jan 22
MedConv: Convolutions Beat Transformers on Long-Tailed Bone Density Prediction Paper • 2502.00631 • Published Feb 2
PathoHR: Breast Cancer Survival Prediction on High-Resolution Pathological Images Paper • 2503.17970 • Published Mar 23 • 3