Update README.md
Browse files
README.md
CHANGED
|
@@ -126,12 +126,6 @@ Supervised speech instruction finetuning via knowledge-distillation. For more in
|
|
| 126 |
- **Training regime:** BF16 mixed precision training
|
| 127 |
- **Hardware used:** 8x H100 GPUs
|
| 128 |
|
| 129 |
-
#### Speeds, Sizes, Times
|
| 130 |
-
|
| 131 |
-
The current version of Ultravox, when invoked with audio content, has a time-to-first-token (TTFT) of approximately 150ms, and a tokens-per-second rate of ~50-100 when using an A100-40GB GPU, all using a text-based LLM (Llama, Gemma, or Qwen) backbone.
|
| 132 |
-
|
| 133 |
-
Check out the audio tab on [TheFastest.ai](https://thefastest.ai/?m=audio) for daily benchmarks and a comparison with other existing models.
|
| 134 |
-
|
| 135 |
## Evaluation
|
| 136 |
|
| 137 |
Evaluations are conducted on covost2 (speech translation measured in BLEU), fleurs and ultravox_calls (speech recognition measured in WER), big bench audio (audio reasoning measured in accuracy), as well as musan and ultravox_unintelligible (noise/unintelligible speech detection measured in recall).
|
|
|
|
| 126 |
- **Training regime:** BF16 mixed precision training
|
| 127 |
- **Hardware used:** 8x H100 GPUs
|
| 128 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
## Evaluation
|
| 130 |
|
| 131 |
Evaluations are conducted on covost2 (speech translation measured in BLEU), fleurs and ultravox_calls (speech recognition measured in WER), big bench audio (audio reasoning measured in accuracy), as well as musan and ultravox_unintelligible (noise/unintelligible speech detection measured in recall).
|