Voice Activity Detection
ONNX
speech-processing
semantic-vad
multilingual
smart-turn-v3 / benchmarks /smart-turn-v3.1-cpu.md
marcus-daily
Smart Turn v3.1
16c8130
# Endpointing Model Benchmark Report
**Model:** `/data/smart-turn-v3.1-cpu.onnx`
**Generated:** 2025-12-03 16:13:05 UTC
## Accuracy Results
**Total Samples:** 31,473
**Unique Languages:** ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic, ๐Ÿ‡ง๐Ÿ‡ฉ Bengali, ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish, ๐Ÿ‡ฉ๐Ÿ‡ช German, ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English, ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish, ๐Ÿ‡ซ๐Ÿ‡ท French, ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi, ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian, ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ฏ๐Ÿ‡ต Japanese, ๐Ÿ‡ฐ๐Ÿ‡ท Korean, ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi, ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch, ๐Ÿ‡ณ๐Ÿ‡ด Norwegian, ๐Ÿ‡ต๐Ÿ‡ฑ Polish, ๐Ÿ‡ต๐Ÿ‡น Portuguese, ๐Ÿ‡ท๐Ÿ‡บ Russian, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡น๐Ÿ‡ท Turkish, ๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian, ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese, ๐Ÿ‡จ๐Ÿ‡ณ Chinese
**Unique Datasets:** chirp3_1, chirp3_2, human_5, human_convcollector_1, liva_1, midcentury_1, mundo_1, orpheus_endfiller_1, orpheus_grammar_1, orpheus_midfiller_1, rime_2
### Overall Performance
| Metric | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|--------|--------------|--------------|---------------------|---------------------|
| Overall | 31,473 | 93.02 | 3.07 | 3.91 |
### Performance by Language
| Language | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|----------|--------------|--------------|---------------------|---------------------|
| ๐Ÿ‡ฏ๐Ÿ‡ต Japanese | 834 | 97.36 | 1.44 | 1.20 |
| ๐Ÿ‡ฐ๐Ÿ‡ท Korean | 890 | 96.97 | 1.01 | 2.02 |
| ๐Ÿ‡น๐Ÿ‡ท Turkish | 966 | 96.27 | 2.07 | 1.66 |
| ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch | 1,401 | 95.93 | 1.57 | 2.50 |
| ๐Ÿ‡ซ๐Ÿ‡ท French | 1,253 | 95.45 | 1.84 | 2.71 |
| ๐Ÿ‡ฉ๐Ÿ‡ช German | 1,322 | 95.39 | 2.27 | 2.34 |
| ๐Ÿ‡ฎ๐Ÿ‡น Italian | 782 | 95.14 | 2.30 | 2.56 |
| ๐Ÿ‡ต๐Ÿ‡ฑ Polish | 976 | 95.08 | 3.07 | 1.84 |
| ๐Ÿ‡ซ๐Ÿ‡ฎ Finnish | 1,010 | 94.85 | 2.48 | 2.67 |
| ๐Ÿ‡ต๐Ÿ‡น Portuguese | 1,398 | 94.71 | 1.57 | 3.72 |
| ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡บ๐Ÿ‡ธ English | 7,722 | 94.69 | 2.24 | 3.07 |
| ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian | 971 | 94.34 | 2.47 | 3.19 |
| ๐Ÿ‡ฉ๐Ÿ‡ฐ Danish | 779 | 94.09 | 3.21 | 2.70 |
| ๐Ÿ‡ท๐Ÿ‡บ Russian | 1,470 | 93.61 | 2.93 | 3.47 |
| ๐Ÿ‡บ๐Ÿ‡ฆ Ukrainian | 929 | 93.33 | 3.01 | 3.66 |
| ๐Ÿ‡ณ๐Ÿ‡ด Norwegian | 1,014 | 93.00 | 3.06 | 3.94 |
| ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi | 1,295 | 92.90 | 4.09 | 3.01 |
| ๐Ÿ‡ช๐Ÿ‡ธ Spanish | 1,791 | 90.12 | 4.97 | 4.91 |
| ๐Ÿ‡ธ๐Ÿ‡ฆ Arabic | 947 | 88.60 | 5.60 | 5.81 |
| ๐Ÿ‡ฎ๐Ÿ‡ณ Marathi | 774 | 87.34 | 7.24 | 5.43 |
| ๐Ÿ‡จ๐Ÿ‡ณ Chinese | 945 | 85.93 | 3.49 | 10.58 |
| ๐Ÿ‡ง๐Ÿ‡ฉ Bengali | 1,000 | 84.90 | 8.10 | 7.00 |
| ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese | 1,004 | 77.39 | 6.47 | 16.14 |
### Performance by Dataset
| Dataset | Sample Count | Accuracy (%) | False Positives (%) | False Negatives (%) |
|---------|--------------|--------------|---------------------|---------------------|
| midcentury_1 | 1,044 | 99.04 | 0.10 | 0.86 |
| orpheus_endfiller_1 | 182 | 98.90 | 0.00 | 1.10 |
| rime_2 | 396 | 98.23 | 0.25 | 1.52 |
| human_5 | 402 | 97.76 | 0.50 | 1.74 |
| liva_1 | 3,832 | 94.18 | 2.71 | 3.11 |
| chirp3_1 | 16,300 | 94.07 | 2.55 | 3.38 |
| chirp3_2 | 8,428 | 89.68 | 4.60 | 5.72 |
| orpheus_grammar_1 | 163 | 89.57 | 6.13 | 4.29 |
| human_convcollector_1 | 90 | 88.89 | 2.22 | 8.89 |
| orpheus_midfiller_1 | 140 | 88.57 | 2.86 | 8.57 |
| mundo_1 | 496 | 86.69 | 7.66 | 5.65 |