digital-avatar commited on
Commit
47e4366
·
verified ·
1 Parent(s): a02ed76

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +232 -160
README.md CHANGED
@@ -1,160 +1,232 @@
1
- <h2 align='center'>Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis</h2>
2
-
3
- <div align='center'>
4
- <a href=""><strong>Tianqi Li</strong></a>
5
- ·
6
- <a href=""><strong>Ruobing Zheng</strong></a><sup>†</sup>
7
- ·
8
- <a href=""><strong>Minghui Yang</strong></a>
9
- ·
10
- <a href=""><strong>Jingdong Chen</strong></a>
11
- ·
12
- <a href=""><strong>Ming Yang</strong></a>
13
- </div>
14
- <div align='center'>
15
- Ant Group
16
- </div>
17
- <br>
18
- <div align='center'>
19
- <a href='https://arxiv.org/abs/2411.19509'><img src='https://img.shields.io/badge/Paper-arXiv-red'></a>
20
- <a href='https://digital-avatar.github.io/ai/Ditto/'><img src='https://img.shields.io/badge/Project-Page-blue'></a>
21
- <a href='https://huggingface.co/digital-avatar/ditto-talkinghead'><img src='https://img.shields.io/badge/Model-HuggingFace-yellow'></a>
22
- <a href='https://github.com/antgroup/ditto-talkinghead'><img src='https://img.shields.io/badge/Code-GitHub-purple'></a>
23
- <!-- <a href='https://github.com/antgroup/ditto-talkinghead'><img src='https://img.shields.io/github/stars/antgroup/ditto-talkinghead?style=social'></a> -->
24
- </div>
25
- <br>
26
- <div align="center">
27
- <video style="width: 95%; object-fit: cover;" controls loop src="https://digital-avatar.github.io/ai/Ditto/asserts/videos/full_body_en.mp4" muted="false"></video>
28
- <p>
29
- ✨ For more results, visit our <a href="https://digital-avatar.github.io/ai/Ditto/"><strong>Project Page</strong></a> ✨
30
- </p>
31
- </div>
32
-
33
-
34
- ## 📌 Updates
35
-
36
- * [2025.01.10] 🔥 We release our inference [codes](https://github.com/antgroup/ditto-talkinghead) and [models](https://huggingface.co/digital-avatar/ditto-talkinghead).
37
- * [2024.11.29] 🔥 Our [paper](https://arxiv.org/abs/2411.19509) is in public on arxiv.
38
-
39
-
40
-
41
- ## 🛠️ Installation
42
-
43
- Tested Environment
44
- - System: Centos 7.2
45
- - GPU: A100
46
- - Python: 3.10
47
- - tensorRT: 8.6.1
48
-
49
-
50
- Clone the codes from [GitHub](https://github.com/antgroup/ditto-talkinghead):
51
- ```bash
52
- git clone https://github.com/antgroup/ditto-talkinghead
53
- cd ditto-talkinghead
54
- ```
55
-
56
-
57
- Create `conda` environment:
58
- ```bash
59
- conda env create -f environment.yaml
60
- conda activate ditto
61
- ```
62
-
63
- ## 📥 Download Checkpoints
64
-
65
- Download checkpoints from [HuggingFace](https://huggingface.co/digital-avatar/ditto-talkinghead) and put them in `checkpoints` dir:
66
- ```bash
67
- git lfs install
68
- git clone https://huggingface.co/digital-avatar/ditto-talkinghead checkpoints
69
- ```
70
-
71
- The `checkpoints` should be like:
72
- ```text
73
- ./checkpoints/
74
- ├── ditto_cfg
75
- │   ├── v0.4_hubert_cfg_trt.pkl
76
- │   └── v0.4_hubert_cfg_trt_online.pkl
77
- ├── ditto_onnx
78
- │   ├── appearance_extractor.onnx
79
- │   ├── blaze_face.onnx
80
- │   ├── decoder.onnx
81
- │   ├── face_mesh.onnx
82
- │   ├── hubert.onnx
83
- │   ├── insightface_det.onnx
84
- │   ├── landmark106.onnx
85
- │   ├── landmark203.onnx
86
- │   ├── libgrid_sample_3d_plugin.so
87
- │   ├── lmdm_v0.4_hubert.onnx
88
- │   ├── motion_extractor.onnx
89
- │   ├── stitch_network.onnx
90
- │   └── warp_network.onnx
91
- └── ditto_trt_Ampere_Plus
92
- ├── appearance_extractor_fp16.engine
93
- ├── blaze_face_fp16.engine
94
- ├── decoder_fp16.engine
95
- ├── face_mesh_fp16.engine
96
- ├── hubert_fp32.engine
97
- ├── insightface_det_fp16.engine
98
- ├── landmark106_fp16.engine
99
- ├── landmark203_fp16.engine
100
- ├── lmdm_v0.4_hubert_fp32.engine
101
- ├── motion_extractor_fp32.engine
102
- ├── stitch_network_fp16.engine
103
- └── warp_network_fp16.engine
104
- ```
105
-
106
- - The `ditto_cfg/v0.4_hubert_cfg_trt_online.pkl` is online config
107
- - The `ditto_cfg/v0.4_hubert_cfg_trt.pkl` is offline config
108
-
109
-
110
- ## 🚀 Inference
111
-
112
- Run `inference.py`:
113
-
114
- ```shell
115
- python inference.py \
116
- --data_root "<path-to-trt-model>" \
117
- --cfg_pkl "<path-to-cfg-pkl>" \
118
- --audio_path "<path-to-input-audio>" \
119
- --source_path "<path-to-input-image>" \
120
- --output_path "<path-to-output-mp4>"
121
- ```
122
-
123
- For example:
124
-
125
- ```shell
126
- python inference.py \
127
- --data_root "./checkpoints/ditto_trt_Ampere_Plus" \
128
- --cfg_pkl "./checkpoints/ditto_cfg/v0.4_hubert_cfg_trt.pkl" \
129
- --audio_path "./example/audio.wav" \
130
- --source_path "./example/image.png" \
131
- --output_path "./tmp/result.mp4"
132
- ```
133
-
134
- ❗Note:
135
-
136
- We have provided the tensorRT model with `hardware-compatibility-level=Ampere_Plus` (`checkpoints/ditto_trt_Ampere_Plus/`). If your GPU does not support it, please execute the `cvt_onnx_to_trt.py` script to convert from the general onnx model (`checkpoints/ditto_onnx/`) to the tensorRT model.
137
-
138
- ```bash
139
- python script/cvt_onnx_to_trt.py --onnx_dir "./checkpoints/ditto_onnx" --trt_dir "./checkpoints/ditto_trt_custom"
140
- ```
141
-
142
- Then run `inference.py` with `--data_root=./checkpoints/ditto_trt_custom`.
143
-
144
-
145
- ## 📧 Acknowledgement
146
- Our implementation is based on [S2G-MDDiffusion](https://github.com/thuhcsi/S2G-MDDiffusion) and [LivePortrait](https://github.com/KwaiVGI/LivePortrait). Thanks for their remarkable contribution and released code! If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.
147
-
148
- ## ⚖️ License
149
- This repository is released under the Apache-2.0 license as found in the [LICENSE](LICENSE) file.
150
-
151
- ## 📚 Citation
152
- If you find this codebase useful for your research, please use the following entry.
153
- ```BibTeX
154
- @article{li2024ditto,
155
- title={Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis},
156
- author={Li, Tianqi and Zheng, Ruobing and Yang, Minghui and Chen, Jingdong and Yang, Ming},
157
- journal={arXiv preprint arXiv:2411.19509},
158
- year={2024}
159
- }
160
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <h2 align='center'>Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis</h2>
2
+
3
+ <div align='center'>
4
+ <a href=""><strong>Tianqi Li</strong></a>
5
+ ·
6
+ <a href=""><strong>Ruobing Zheng</strong></a><sup>†</sup>
7
+ ·
8
+ <a href=""><strong>Minghui Yang</strong></a>
9
+ ·
10
+ <a href=""><strong>Jingdong Chen</strong></a>
11
+ ·
12
+ <a href=""><strong>Ming Yang</strong></a>
13
+ </div>
14
+ <div align='center'>
15
+ Ant Group
16
+ </div>
17
+ <br>
18
+ <div align='center'>
19
+ <a href='https://arxiv.org/abs/2411.19509'><img src='https://img.shields.io/badge/Paper-arXiv-red'></a>
20
+ <a href='https://digital-avatar.github.io/ai/Ditto/'><img src='https://img.shields.io/badge/Project-Page-blue'></a>
21
+ <a href='https://huggingface.co/digital-avatar/ditto-talkinghead'><img src='https://img.shields.io/badge/Model-HuggingFace-yellow'></a>
22
+ <a href='https://github.com/antgroup/ditto-talkinghead'><img src='https://img.shields.io/badge/Code-GitHub-purple'></a>
23
+ <!-- <a href='https://github.com/antgroup/ditto-talkinghead'><img src='https://img.shields.io/github/stars/antgroup/ditto-talkinghead?style=social'></a> -->
24
+ <a href='https://colab.research.google.com/drive/19SUi1TiO32IS-Crmsu9wrkNspWE8tFbs?usp=sharing'><img src='https://img.shields.io/badge/Demo-Colab-orange'></a>
25
+ </div>
26
+ <br>
27
+ <div align="center">
28
+ <video style="width: 95%; object-fit: cover;" controls loop src="https://github.com/user-attachments/assets/ef1a0b08-bff3-4997-a6dd-62a7f51cdb40" muted="false"></video>
29
+ <p>
30
+ ✨ For more results, visit our <a href="https://digital-avatar.github.io/ai/Ditto/"><strong>Project Page</strong></a>
31
+ </p>
32
+ </div>
33
+
34
+
35
+ ## 📌 Updates
36
+ * [2025.07.11] 🔥 The [PyTorch model](#-pytorch-model) is now available.
37
+ * [2025.07.07] 🔥 Ditto is accepted by ACM MM 2025.
38
+ * [2025.01.21] 🔥 We update the [Colab](https://colab.research.google.com/drive/19SUi1TiO32IS-Crmsu9wrkNspWE8tFbs?usp=sharing) demo, welcome to try it.
39
+ * [2025.01.10] 🔥 We release our inference [codes](https://github.com/antgroup/ditto-talkinghead) and [models](https://huggingface.co/digital-avatar/ditto-talkinghead).
40
+ * [2024.11.29] 🔥 Our [paper](https://arxiv.org/abs/2411.19509) is in public on arxiv.
41
+
42
+
43
+
44
+ ## 🛠️ Installation
45
+
46
+ Tested Environment
47
+ - System: Centos 7.2
48
+ - GPU: A100
49
+ - Python: 3.10
50
+ - tensorRT: 8.6.1
51
+
52
+
53
+ Clone the codes from [GitHub](https://github.com/antgroup/ditto-talkinghead):
54
+ ```bash
55
+ git clone https://github.com/antgroup/ditto-talkinghead
56
+ cd ditto-talkinghead
57
+ ```
58
+
59
+ ### Conda
60
+ Create `conda` environment:
61
+ ```bash
62
+ conda env create -f environment.yaml
63
+ conda activate ditto
64
+ ```
65
+
66
+ ### Pip
67
+ If you have problems creating a conda environment, you can also refer to our [Colab](https://colab.research.google.com/drive/19SUi1TiO32IS-Crmsu9wrkNspWE8tFbs?usp=sharing).
68
+ After correctly installing `pytorch`, `cuda` and `cudnn`, you only need to install a few packages using pip:
69
+ ```bash
70
+ pip install \
71
+ tensorrt==8.6.1 \
72
+ librosa \
73
+ tqdm \
74
+ filetype \
75
+ imageio \
76
+ opencv_python_headless \
77
+ scikit-image \
78
+ cython \
79
+ cuda-python \
80
+ imageio-ffmpeg \
81
+ colored \
82
+ polygraphy \
83
+ numpy==2.0.1
84
+ ```
85
+
86
+ If you don't use `conda`, you may also need to install `ffmpeg` according to the [official website](https://www.ffmpeg.org/download.html).
87
+
88
+
89
+ ## 📥 Download Checkpoints
90
+
91
+ Download checkpoints from [HuggingFace](https://huggingface.co/digital-avatar/ditto-talkinghead) and put them in `checkpoints` dir:
92
+ ```bash
93
+ git lfs install
94
+ git clone https://huggingface.co/digital-avatar/ditto-talkinghead checkpoints
95
+ ```
96
+
97
+ The `checkpoints` should be like:
98
+ ```text
99
+ ./checkpoints/
100
+ ├── ditto_cfg
101
+ │   ├── v0.4_hubert_cfg_trt.pkl
102
+ │   └── v0.4_hubert_cfg_trt_online.pkl
103
+ ├── ditto_onnx
104
+ │   ├── appearance_extractor.onnx
105
+ │   ├── blaze_face.onnx
106
+ │   ├── decoder.onnx
107
+ │   ├── face_mesh.onnx
108
+ │   ├── hubert.onnx
109
+ │   ├── insightface_det.onnx
110
+ │   ├── landmark106.onnx
111
+ │   ├── landmark203.onnx
112
+ │   ├── libgrid_sample_3d_plugin.so
113
+ │   ├── lmdm_v0.4_hubert.onnx
114
+ │   ├── motion_extractor.onnx
115
+ │   ├── stitch_network.onnx
116
+ │   └── warp_network.onnx
117
+ └── ditto_trt_Ampere_Plus
118
+ ├── appearance_extractor_fp16.engine
119
+ ├── blaze_face_fp16.engine
120
+ ├── decoder_fp16.engine
121
+ ├── face_mesh_fp16.engine
122
+ ├── hubert_fp32.engine
123
+ ├── insightface_det_fp16.engine
124
+ ├── landmark106_fp16.engine
125
+ ├── landmark203_fp16.engine
126
+ ├── lmdm_v0.4_hubert_fp32.engine
127
+ ├── motion_extractor_fp32.engine
128
+ ├── stitch_network_fp16.engine
129
+ └── warp_network_fp16.engine
130
+ ```
131
+
132
+ - The `ditto_cfg/v0.4_hubert_cfg_trt_online.pkl` is online config
133
+ - The `ditto_cfg/v0.4_hubert_cfg_trt.pkl` is offline config
134
+
135
+
136
+ ## 🚀 Inference
137
+
138
+ Run `inference.py`:
139
+
140
+ ```shell
141
+ python inference.py \
142
+ --data_root "<path-to-trt-model>" \
143
+ --cfg_pkl "<path-to-cfg-pkl>" \
144
+ --audio_path "<path-to-input-audio>" \
145
+ --source_path "<path-to-input-image>" \
146
+ --output_path "<path-to-output-mp4>"
147
+ ```
148
+
149
+ For example:
150
+
151
+ ```shell
152
+ python inference.py \
153
+ --data_root "./checkpoints/ditto_trt_Ampere_Plus" \
154
+ --cfg_pkl "./checkpoints/ditto_cfg/v0.4_hubert_cfg_trt.pkl" \
155
+ --audio_path "./example/audio.wav" \
156
+ --source_path "./example/image.png" \
157
+ --output_path "./tmp/result.mp4"
158
+ ```
159
+
160
+ ❗Note:
161
+
162
+ We have provided the tensorRT model with `hardware-compatibility-level=Ampere_Plus` (`checkpoints/ditto_trt_Ampere_Plus/`). If your GPU does not support it, please execute the `cvt_onnx_to_trt.py` script to convert from the general onnx model (`checkpoints/ditto_onnx/`) to the tensorRT model.
163
+
164
+ ```bash
165
+ python scripts/cvt_onnx_to_trt.py --onnx_dir "./checkpoints/ditto_onnx" --trt_dir "./checkpoints/ditto_trt_custom"
166
+ ```
167
+
168
+ Then run `inference.py` with `--data_root=./checkpoints/ditto_trt_custom`.
169
+
170
+
171
+ ## ⚡ PyTorch Model
172
+ *Based on community interest and to better support further development, we are now open-sourcing the PyTorch version of the model.*
173
+
174
+
175
+ We have added the PyTorch model and corresponding configuration files to the [HuggingFace](https://huggingface.co/digital-avatar/ditto-talkinghead). Please refer to [Download Checkpoints](#-download-checkpoints) to prepare the model files.
176
+
177
+ The `checkpoints` should be like:
178
+ ```text
179
+ ./checkpoints/
180
+ ├── ditto_cfg
181
+ │   ├── ...
182
+ │   └── v0.4_hubert_cfg_pytorch.pkl
183
+ ├── ...
184
+ └── ditto_pytorch
185
+ ├── aux_models
186
+ │ ├── 2d106det.onnx
187
+ │ ├── det_10g.onnx
188
+ │ ├── face_landmarker.task
189
+ │ ├── hubert_streaming_fix_kv.onnx
190
+ │ └── landmark203.onnx
191
+ └── models
192
+ ├── appearance_extractor.pth
193
+ ├── decoder.pth
194
+ ├── lmdm_v0.4_hubert.pth
195
+ ├── motion_extractor.pth
196
+ ├── stitch_network.pth
197
+ └── warp_network.pth
198
+ ```
199
+
200
+ To run inference, execute the following command:
201
+
202
+ ```shell
203
+ python inference.py \
204
+ --data_root "./checkpoints/ditto_pytorch" \
205
+ --cfg_pkl "./checkpoints/ditto_cfg/v0.4_hubert_cfg_pytorch.pkl" \
206
+ --audio_path "./example/audio.wav" \
207
+ --source_path "./example/image.png" \
208
+ --output_path "./tmp/result.mp4"
209
+ ```
210
+
211
+
212
+ ## 📧 Acknowledgement
213
+ Our implementation is based on [S2G-MDDiffusion](https://github.com/thuhcsi/S2G-MDDiffusion) and [LivePortrait](https://github.com/KwaiVGI/LivePortrait). Thanks for their remarkable contribution and released code! If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.
214
+
215
+ ## ⚖️ License
216
+ This repository is released under the Apache-2.0 license as found in the [LICENSE](LICENSE) file.
217
+
218
+ ## 📚 Citation
219
+ If you find this codebase useful for your research, please use the following entry.
220
+ ```BibTeX
221
+ @article{li2024ditto,
222
+ title={Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis},
223
+ author={Li, Tianqi and Zheng, Ruobing and Yang, Minghui and Chen, Jingdong and Yang, Ming},
224
+ journal={arXiv preprint arXiv:2411.19509},
225
+ year={2024}
226
+ }
227
+ ```
228
+
229
+
230
+ ## 🌟 Star History
231
+
232
+ [![Star History Chart](https://api.star-history.com/svg?repos=antgroup/ditto-talkinghead&type=Date)](https://www.star-history.com/#antgroup/ditto-talkinghead&Date)