Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
e-tuanzi 's Collections
260313
multimodal
3d
agent
light
video
game

260313

updated 6 days ago
Upvote
-

  • GLM-5: from Vibe Coding to Agentic Engineering

    Paper • 2602.15763 • Published 29 days ago • 115

  • DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

    Paper • 2602.16742 • Published 29 days ago • 12

  • From Perception to Action: An Interactive Benchmark for Vision Reasoning

    Paper • 2602.21015 • Published 23 days ago • 23

  • Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs

    Paper • 2603.09906 • Published 9 days ago • 70

  • RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

    Paper • 2603.04639 • Published 14 days ago • 27

  • VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection

    Paper • 2603.00912 • Published 18 days ago • 39

  • RealWonder: Real-Time Physical Action-Conditioned Video Generation

    Paper • 2603.05449 • Published 13 days ago • 12

  • Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

    Paper • 2603.07660 • Published 11 days ago • 82

  • CAST: Modeling Visual State Transitions for Consistent Video Retrieval

    Paper • 2603.08648 • Published 10 days ago • 4

  • MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents

    Paper • 2603.09827 • Published 9 days ago • 28
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs