fixie-ai
/

ultravox-v0_6-gemma-3-27b

Audio-Text-to-Text

feature-extraction

Model card Files Files and versions

patricklifixie commited on Sep 12

Commit

17a79bc

·

verified ·

1 Parent(s): 247f9a4

Update README.md

Files changed (1) hide show

README.md +0 -6

README.md CHANGED Viewed

@@ -126,12 +126,6 @@ Supervised speech instruction finetuning via knowledge-distillation. For more in
 - **Training regime:** BF16 mixed precision training
 - **Hardware used:** 8x H100 GPUs
-#### Speeds, Sizes, Times
-The current version of Ultravox, when invoked with audio content, has a time-to-first-token (TTFT) of approximately 150ms, and a tokens-per-second rate of ~50-100 when using an A100-40GB GPU, all using a text-based LLM (Llama, Gemma, or Qwen) backbone.
-Check out the audio tab on [TheFastest.ai](https://thefastest.ai/?m=audio) for daily benchmarks and a comparison with other existing models.
 ## Evaluation
 Evaluations are conducted on covost2 (speech translation measured in BLEU), fleurs and ultravox_calls (speech recognition measured in WER), big bench audio (audio reasoning measured in accuracy), as well as musan and ultravox_unintelligible (noise/unintelligible speech detection measured in recall).

 - **Training regime:** BF16 mixed precision training
 - **Hardware used:** 8x H100 GPUs
 ## Evaluation
 Evaluations are conducted on covost2 (speech translation measured in BLEU), fleurs and ultravox_calls (speech recognition measured in WER), big bench audio (audio reasoning measured in accuracy), as well as musan and ultravox_unintelligible (noise/unintelligible speech detection measured in recall).