Abstract
Embody 3D is a multimodal dataset featuring extensive 3D motion data with hand tracking, body shape, text annotations, and audio tracks from multiple participants in various scenarios.
The Codec Avatars Lab at Meta introduces Embody 3D, a multimodal dataset of 500 individual hours of 3D motion data from 439 participants collected in a multi-camera collection stage, amounting to over 54 million frames of tracked 3D motion. The dataset features a wide range of single-person motion data, including prompted motions, hand gestures, and locomotion; as well as multi-person behavioral and conversational data like discussions, conversations in different emotional states, collaborative activities, and co-living scenarios in an apartment-like space. We provide tracked human motion including hand tracking and body shape, text annotations, and a separate audio track for each participant.
Community
The Codec Avatars Lab at Meta introduces Embody 3D, a multimodal dataset of 500 individual hours of 3D motion data from 439 participants collected in a multi-camera collection stage, amounting to over 54 million frames of tracked 3D motion. The dataset features a wide range of single-person motion data, including prompted motions, hand gestures, and locomotion; as well as multi-person behavioral and conversational data like discussions, conversations in different emotional states, collaborative activities, and co-living scenarios in an apartment-like space. We provide tracked human motion including hand tracking and body shape, text annotations, and a separate audio track for each participant.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Ego-Exo 3D Hand Tracking in the Wild with a Mobile Multi-Camera Rig (2025)
- TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation (2025)
- AssemblyHands-X: Modeling 3D Hand-Body Coordination for Understanding Bimanual Human Activities (2025)
- InterPose: Learning to Generate Human-Object Interactions from Large-Scale Web Videos (2025)
- MDD: A Dataset for Text-and-Music Conditioned Duet Dance Generation (2025)
- InfinityHuman: Towards Long-Term Audio-Driven Human (2025)
- MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper