YaoyaoChang
commited on
Commit
·
217a7fe
1
Parent(s):
0220f13
update
Browse files
README.md
CHANGED
|
@@ -39,7 +39,18 @@ Transformer-based Large Language Model (LLM) integrated with specialized acousti
|
|
| 39 |
- VibeVoice Training: Pre-trained tokenizers are frozen; only the LLM and diffusion head parameters are trained. A curriculum learning strategy is used for input sequence length (4k -> 16K -> 32K -> 64K). Text tokenizer not explicitly specified, but the LLM (Qwen2.5) typically uses its own. Audio is "tokenized" via the acoustic and semantic tokenizers.
|
| 40 |
|
| 41 |
|
| 42 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
### Direct intended uses
|
| 44 |
The VibeVoice model is limited to research purpose use exploring highly realistic audio dialogue generation detailed in the [tech report](https://github.com/microsoft/VibeVoice/blob/main/report/TechnicalReport.pdf).
|
| 45 |
|
|
|
|
| 39 |
- VibeVoice Training: Pre-trained tokenizers are frozen; only the LLM and diffusion head parameters are trained. A curriculum learning strategy is used for input sequence length (4k -> 16K -> 32K -> 64K). Text tokenizer not explicitly specified, but the LLM (Qwen2.5) typically uses its own. Audio is "tokenized" via the acoustic and semantic tokenizers.
|
| 40 |
|
| 41 |
|
| 42 |
+
## Models
|
| 43 |
+
| Model | Context Length | Generation Length | Weight |
|
| 44 |
+
|-------|----------------|----------|----------|
|
| 45 |
+
| VibeVoice-0.5B-Streaming | - | - | On the way |
|
| 46 |
+
| VibeVoice-1.5B | 64K | ~90 min | You are here. |
|
| 47 |
+
| VibeVoice-7B-Preview| 32K | ~45 min | [HF link](https://huggingface.co/WestZhang/VibeVoice-Large-pt) |
|
| 48 |
+
|
| 49 |
+
## Installation and Usage
|
| 50 |
+
|
| 51 |
+
Please refer to [GitHub README](https://github.com/microsoft/VibeVoice?tab=readme-ov-file#installation)
|
| 52 |
+
|
| 53 |
+
## Responsible Usage
|
| 54 |
### Direct intended uses
|
| 55 |
The VibeVoice model is limited to research purpose use exploring highly realistic audio dialogue generation detailed in the [tech report](https://github.com/microsoft/VibeVoice/blob/main/report/TechnicalReport.pdf).
|
| 56 |
|