update README
Browse files
README.md
CHANGED
|
@@ -10,10 +10,11 @@ A core innovation of VibeVoice is its use of continuous speech tokenizers (Acous
|
|
| 10 |
|
| 11 |
The model can synthesize speech up to **90 minutes** long with up to **4 distinct speakers**, surpassing the typical 1-2 speaker limits of many prior models.
|
| 12 |
|
|
|
|
| 13 |
|
| 14 |
-
➡️ **Project
|
| 15 |
|
| 16 |
-
➡️ **
|
| 17 |
|
| 18 |
## Training details
|
| 19 |
Transformer-based Large Language Model (LLM) integrated with specialized acoustic and semantic tokenizers and a diffusion-based decoding head.
|
|
|
|
| 10 |
|
| 11 |
The model can synthesize speech up to **90 minutes** long with up to **4 distinct speakers**, surpassing the typical 1-2 speaker limits of many prior models.
|
| 12 |
|
| 13 |
+
➡️ **Technical Report:** [VibeVoice Technical Report](https://github.com/microsoft/VibeVoice/blob/main/report/TechnicalReport.pdf)
|
| 14 |
|
| 15 |
+
➡️ **Project Page:** [microsoft/VibeVoice](https://microsoft.github.io/VibeVoice)
|
| 16 |
|
| 17 |
+
➡️ **Code:** [microsoft/VibeVoice-Code](https://github.com/microsoft/VibeVoice)
|
| 18 |
|
| 19 |
## Training details
|
| 20 |
Transformer-based Large Language Model (LLM) integrated with specialized acoustic and semantic tokenizers and a diffusion-based decoding head.
|