digital-avatar
/

ditto-talkinghead

ONNX

Model card Files Files and versions

xet

Community

digital-avatar commited on Jul 11

Commit

47e4366

verified ·

1 Parent(s): a02ed76

Update README.md

Browse files

Files changed (1) hide show

README.md +232 -160

README.md CHANGED Viewed

@@ -1,160 +1,232 @@
-<h2 align='center'>Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis</h2>
-<div align='center'>
-    <a href=""><strong>Tianqi Li</strong></a>
-    ·
-    <a href=""><strong>Ruobing Zheng</strong></a><sup>†</sup>
-    ·
-    <a href=""><strong>Minghui Yang</strong></a>
-    ·
-    <a href=""><strong>Jingdong Chen</strong></a>
-    ·
-    <a href=""><strong>Ming Yang</strong></a>
-</div>
-<div align='center'>
-Ant Group
-</div>
-<br>
-<div align='center'>
-    <a href='https://arxiv.org/abs/2411.19509'><img src='https://img.shields.io/badge/Paper-arXiv-red'></a>
-    <a href='https://digital-avatar.github.io/ai/Ditto/'><img src='https://img.shields.io/badge/Project-Page-blue'></a>
-    <a href='https://huggingface.co/digital-avatar/ditto-talkinghead'><img src='https://img.shields.io/badge/Model-HuggingFace-yellow'></a>
-    <a href='https://github.com/antgroup/ditto-talkinghead'><img src='https://img.shields.io/badge/Code-GitHub-purple'></a>
-    <!-- <a href='https://github.com/antgroup/ditto-talkinghead'><img src='https://img.shields.io/github/stars/antgroup/ditto-talkinghead?style=social'></a> -->
-</div>
-<br>
-<div align="center">
-    <video style="width: 95%; object-fit: cover;" controls loop src="https://digital-avatar.github.io/ai/Ditto/asserts/videos/full_body_en.mp4" muted="false"></video>
-    <p>
-    ✨  For more results, visit our <a href="https://digital-avatar.github.io/ai/Ditto/"><strong>Project Page</strong></a> ✨
-    </p>
-</div>
-## 📌 Updates
-* [2025.01.10] 🔥 We release our inference [codes](https://github.com/antgroup/ditto-talkinghead) and [models](https://huggingface.co/digital-avatar/ditto-talkinghead).
-* [2024.11.29] 🔥 Our [paper](https://arxiv.org/abs/2411.19509) is in public on arxiv.
-## 🛠️ Installation
-Tested Environment
-- System: Centos 7.2
-- GPU: A100
-- Python: 3.10
-- tensorRT: 8.6.1
-Clone the codes from [GitHub](https://github.com/antgroup/ditto-talkinghead):
-```bash
-git clone https://github.com/antgroup/ditto-talkinghead
-cd ditto-talkinghead
-```
-Create `conda` environment:
-```bash
-conda env create -f environment.yaml
-conda activate ditto
-```
-## 📥 Download Checkpoints
-Download checkpoints from [HuggingFace](https://huggingface.co/digital-avatar/ditto-talkinghead) and put them in `checkpoints` dir:
-```bash
-git lfs install
-git clone https://huggingface.co/digital-avatar/ditto-talkinghead checkpoints
-```
-The `checkpoints` should be like:
-```text
-./checkpoints/
-├── ditto_cfg
-│   ├── v0.4_hubert_cfg_trt.pkl
-│   └── v0.4_hubert_cfg_trt_online.pkl
-├── ditto_onnx
-│   ├── appearance_extractor.onnx
-│   ├── blaze_face.onnx
-│   ├── decoder.onnx
-│   ├── face_mesh.onnx
-│   ├── hubert.onnx
-│   ├── insightface_det.onnx
-│   ├── landmark106.onnx
-│   ├── landmark203.onnx
-│   ├── libgrid_sample_3d_plugin.so
-│   ├── lmdm_v0.4_hubert.onnx
-│   ├── motion_extractor.onnx
-│   ├── stitch_network.onnx
-│   └── warp_network.onnx
-└── ditto_trt_Ampere_Plus
-    ├── appearance_extractor_fp16.engine
-    ├── blaze_face_fp16.engine
-    ├── decoder_fp16.engine
-    ├── face_mesh_fp16.engine
-    ├── hubert_fp32.engine
-    ├── insightface_det_fp16.engine
-    ├── landmark106_fp16.engine
-    ├── landmark203_fp16.engine
-    ├── lmdm_v0.4_hubert_fp32.engine
-    ├── motion_extractor_fp32.engine
-    ├── stitch_network_fp16.engine
-    └── warp_network_fp16.engine
-```
-- The `ditto_cfg/v0.4_hubert_cfg_trt_online.pkl` is online config
-- The `ditto_cfg/v0.4_hubert_cfg_trt.pkl` is offline config
-## 🚀 Inference
-Run `inference.py`:
-```shell
-python inference.py \
-    --data_root "<path-to-trt-model>" \
-    --cfg_pkl "<path-to-cfg-pkl>" \
-    --audio_path "<path-to-input-audio>" \
-    --source_path "<path-to-input-image>" \
-    --output_path "<path-to-output-mp4>"
-```
-For example:
-```shell
-python inference.py \
-    --data_root "./checkpoints/ditto_trt_Ampere_Plus" \
-    --cfg_pkl "./checkpoints/ditto_cfg/v0.4_hubert_cfg_trt.pkl" \
-    --audio_path "./example/audio.wav" \
-    --source_path "./example/image.png" \
-    --output_path "./tmp/result.mp4"
-```
-❗Note:
-We have provided the tensorRT model with `hardware-compatibility-level=Ampere_Plus` (`checkpoints/ditto_trt_Ampere_Plus/`). If your GPU does not support it, please execute the `cvt_onnx_to_trt.py` script to convert from the general onnx model (`checkpoints/ditto_onnx/`) to the tensorRT model.
-```bash
-python script/cvt_onnx_to_trt.py --onnx_dir "./checkpoints/ditto_onnx" --trt_dir "./checkpoints/ditto_trt_custom"
-```
-Then run `inference.py` with `--data_root=./checkpoints/ditto_trt_custom`.
-## 📧 Acknowledgement
-Our implementation is based on [S2G-MDDiffusion](https://github.com/thuhcsi/S2G-MDDiffusion) and [LivePortrait](https://github.com/KwaiVGI/LivePortrait). Thanks for their remarkable contribution and released code! If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.
-## ⚖️ License
-This repository is released under the Apache-2.0 license as found in the [LICENSE](LICENSE) file.
-## 📚 Citation
-If you find this codebase useful for your research, please use the following entry.
-```BibTeX
-@article{li2024ditto,
-    title={Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis},
-    author={Li, Tianqi and Zheng, Ruobing and Yang, Minghui and Chen, Jingdong and Yang, Ming},
-    journal={arXiv preprint arXiv:2411.19509},
-    year={2024}
-}
-```

+<h2 align='center'>Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis</h2>
+<div align='center'>
+    <a href=""><strong>Tianqi Li</strong></a>
+    ·
+    <a href=""><strong>Ruobing Zheng</strong></a><sup>†</sup>
+    ·
+    <a href=""><strong>Minghui Yang</strong></a>
+    ·
+    <a href=""><strong>Jingdong Chen</strong></a>
+    ·
+    <a href=""><strong>Ming Yang</strong></a>
+</div>
+<div align='center'>
+Ant Group
+</div>
+<br>
+<div align='center'>
+    <a href='https://arxiv.org/abs/2411.19509'><img src='https://img.shields.io/badge/Paper-arXiv-red'></a>
+    <a href='https://digital-avatar.github.io/ai/Ditto/'><img src='https://img.shields.io/badge/Project-Page-blue'></a>
+    <a href='https://huggingface.co/digital-avatar/ditto-talkinghead'><img src='https://img.shields.io/badge/Model-HuggingFace-yellow'></a>
+    <a href='https://github.com/antgroup/ditto-talkinghead'><img src='https://img.shields.io/badge/Code-GitHub-purple'></a>
+    <!-- <a href='https://github.com/antgroup/ditto-talkinghead'><img src='https://img.shields.io/github/stars/antgroup/ditto-talkinghead?style=social'></a> -->
+    <a href='https://colab.research.google.com/drive/19SUi1TiO32IS-Crmsu9wrkNspWE8tFbs?usp=sharing'><img src='https://img.shields.io/badge/Demo-Colab-orange'></a>
+</div>
+<br>
+<div align="center">
+    <video style="width: 95%; object-fit: cover;" controls loop src="https://github.com/user-attachments/assets/ef1a0b08-bff3-4997-a6dd-62a7f51cdb40" muted="false"></video>
+    <p>
+    ✨  For more results, visit our <a href="https://digital-avatar.github.io/ai/Ditto/"><strong>Project Page</strong></a> ✨
+    </p>
+</div>
+## 📌 Updates
+* [2025.07.11] 🔥 The [PyTorch model](#-pytorch-model) is now available.
+* [2025.07.07] 🔥 Ditto is accepted by ACM MM 2025.
+* [2025.01.21] 🔥 We update the [Colab](https://colab.research.google.com/drive/19SUi1TiO32IS-Crmsu9wrkNspWE8tFbs?usp=sharing) demo, welcome to try it.
+* [2025.01.10] 🔥 We release our inference [codes](https://github.com/antgroup/ditto-talkinghead) and [models](https://huggingface.co/digital-avatar/ditto-talkinghead).
+* [2024.11.29] 🔥 Our [paper](https://arxiv.org/abs/2411.19509) is in public on arxiv.
+## 🛠️ Installation
+Tested Environment
+- System: Centos 7.2
+- GPU: A100
+- Python: 3.10
+- tensorRT: 8.6.1
+Clone the codes from [GitHub](https://github.com/antgroup/ditto-talkinghead):
+```bash
+git clone https://github.com/antgroup/ditto-talkinghead
+cd ditto-talkinghead
+```
+### Conda
+Create `conda` environment:
+```bash
+conda env create -f environment.yaml
+conda activate ditto
+```
+### Pip
+If you have problems creating a conda environment, you can also refer to our [Colab](https://colab.research.google.com/drive/19SUi1TiO32IS-Crmsu9wrkNspWE8tFbs?usp=sharing).
+After correctly installing `pytorch`, `cuda` and `cudnn`, you only need to install a few packages using pip:
+```bash
+pip install \
+    tensorrt==8.6.1 \
+    librosa \
+    tqdm \
+    filetype \
+    imageio \
+    opencv_python_headless \
+    scikit-image \
+    cython \
+    cuda-python \
+    imageio-ffmpeg \
+    colored \
+    polygraphy \
+    numpy==2.0.1
+```
+If you don't use `conda`, you may also need to install `ffmpeg` according to the [official website](https://www.ffmpeg.org/download.html).
+## 📥 Download Checkpoints
+Download checkpoints from [HuggingFace](https://huggingface.co/digital-avatar/ditto-talkinghead) and put them in `checkpoints` dir:
+```bash
+git lfs install
+git clone https://huggingface.co/digital-avatar/ditto-talkinghead checkpoints
+```
+The `checkpoints` should be like:
+```text
+./checkpoints/
+├── ditto_cfg
+│   ├── v0.4_hubert_cfg_trt.pkl
+│   └── v0.4_hubert_cfg_trt_online.pkl
+├── ditto_onnx
+│   ├── appearance_extractor.onnx
+│   ├── blaze_face.onnx
+│   ├── decoder.onnx
+│   ├── face_mesh.onnx
+│   ├── hubert.onnx
+│   ├── insightface_det.onnx
+│   ├── landmark106.onnx
+│   ├── landmark203.onnx
+│   ├── libgrid_sample_3d_plugin.so
+│   ├── lmdm_v0.4_hubert.onnx
+│   ├── motion_extractor.onnx
+│   ├── stitch_network.onnx
+│   └── warp_network.onnx
+└── ditto_trt_Ampere_Plus
+    ├── appearance_extractor_fp16.engine
+    ├── blaze_face_fp16.engine
+    ├── decoder_fp16.engine
+    ├── face_mesh_fp16.engine
+    ├── hubert_fp32.engine
+    ├── insightface_det_fp16.engine
+    ├── landmark106_fp16.engine
+    ├── landmark203_fp16.engine
+    ├── lmdm_v0.4_hubert_fp32.engine
+    ├── motion_extractor_fp32.engine
+    ├── stitch_network_fp16.engine
+    └── warp_network_fp16.engine
+```
+- The `ditto_cfg/v0.4_hubert_cfg_trt_online.pkl` is online config
+- The `ditto_cfg/v0.4_hubert_cfg_trt.pkl` is offline config
+## 🚀 Inference
+Run `inference.py`:
+```shell
+python inference.py \
+    --data_root "<path-to-trt-model>" \
+    --cfg_pkl "<path-to-cfg-pkl>" \
+    --audio_path "<path-to-input-audio>" \
+    --source_path "<path-to-input-image>" \
+    --output_path "<path-to-output-mp4>"
+```
+For example:
+```shell
+python inference.py \
+    --data_root "./checkpoints/ditto_trt_Ampere_Plus" \
+    --cfg_pkl "./checkpoints/ditto_cfg/v0.4_hubert_cfg_trt.pkl" \
+    --audio_path "./example/audio.wav" \
+    --source_path "./example/image.png" \
+    --output_path "./tmp/result.mp4"
+```
+❗Note:
+We have provided the tensorRT model with `hardware-compatibility-level=Ampere_Plus` (`checkpoints/ditto_trt_Ampere_Plus/`). If your GPU does not support it, please execute the `cvt_onnx_to_trt.py` script to convert from the general onnx model (`checkpoints/ditto_onnx/`) to the tensorRT model.
+```bash
+python scripts/cvt_onnx_to_trt.py --onnx_dir "./checkpoints/ditto_onnx" --trt_dir "./checkpoints/ditto_trt_custom"
+```
+Then run `inference.py` with `--data_root=./checkpoints/ditto_trt_custom`.
+## ⚡ PyTorch Model
+*Based on community interest and to better support further development, we are now open-sourcing the PyTorch version of the model.*
+We have added the PyTorch model and corresponding configuration files to the [HuggingFace](https://huggingface.co/digital-avatar/ditto-talkinghead). Please refer to [Download Checkpoints](#-download-checkpoints) to prepare the model files.
+The `checkpoints` should be like:
+```text
+./checkpoints/
+├── ditto_cfg
+│   ├── ...
+│   └── v0.4_hubert_cfg_pytorch.pkl
+├── ...
+└── ditto_pytorch
+    ├── aux_models
+    │   ├── 2d106det.onnx
+    │   ├── det_10g.onnx
+    │   ├── face_landmarker.task
+    │   ├── hubert_streaming_fix_kv.onnx
+    │   └── landmark203.onnx
+    └── models
+        ├── appearance_extractor.pth
+        ├── decoder.pth
+        ├── lmdm_v0.4_hubert.pth
+        ├── motion_extractor.pth
+        ├── stitch_network.pth
+        └── warp_network.pth
+```
+To run inference, execute the following command:
+```shell
+python inference.py \
+    --data_root "./checkpoints/ditto_pytorch" \
+    --cfg_pkl "./checkpoints/ditto_cfg/v0.4_hubert_cfg_pytorch.pkl" \
+    --audio_path "./example/audio.wav" \
+    --source_path "./example/image.png" \
+    --output_path "./tmp/result.mp4"
+```
+## 📧 Acknowledgement
+Our implementation is based on [S2G-MDDiffusion](https://github.com/thuhcsi/S2G-MDDiffusion) and [LivePortrait](https://github.com/KwaiVGI/LivePortrait). Thanks for their remarkable contribution and released code! If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.
+## ⚖️ License
+This repository is released under the Apache-2.0 license as found in the [LICENSE](LICENSE) file.
+## 📚 Citation
+If you find this codebase useful for your research, please use the following entry.
+```BibTeX
+@article{li2024ditto,
+    title={Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis},
+    author={Li, Tianqi and Zheng, Ruobing and Yang, Minghui and Chen, Jingdong and Yang, Ming},
+    journal={arXiv preprint arXiv:2411.19509},
+    year={2024}
+}
+```
+## 🌟 Star History
+[![Star History Chart](https://api.star-history.com/svg?repos=antgroup/ditto-talkinghead&type=Date)](https://www.star-history.com/#antgroup/ditto-talkinghead&Date)