--- license: mit datasets: - openslr/librispeech_asr language: - en pipeline_tag: automatic-speech-recognition --- # Splitformer
## 1. Overview **Splitformer** is a 36.7M parameters Conformer-based ASR model trained from scratch on 1000 hours of the **LibriSpeech dataset** with an **early‐exit objective**. This architecture introduces **parallel downsampling layers** before the first and last exits to improve performance with minimal extra overhead, while retaining inference speed. Our code for training and inference is available on our [GitHub](https://github.com/augustgw/early-exit-transformer) repository. ### 2. Results on LibriSpeech| Layer | EE-baseline (31.5M) | Splitformer (36.7M) | Wav2Vec2 (94.0M) | WavLM (94.7M) | ||||
|---|---|---|---|---|---|---|---|---|
| test-clean | test-other | test-clean | test-other | test-clean | test-other | test-clean | test-other | |
| 2 | 31.0 | 51.0 | 28.1 | 48.3 | 33.7 | 56.0 | 28.0 | 48.5 |
| 4 | 11.7 | 27.8 | 10.8 | 26.4 | 17.4 | 36.7 | 13.9 | 27.3 |
| 6 | 7.1 | 19.8 | 6.7 | 19.2 | 9.6 | 23.7 | 8.7 | 18.4 |
| 8 | 5.8 | 16.6 | 5.5 | 16.3 | 5.8 | 15.9 | 4.8 | 12.4 |
| 10 | 5.3 | 15.3 | 5.1 | 15.1 | 4.5 | 12.6 | 4.0 | 9.5 |
| 12 | 5.1 | 14.8 | 4.8 | 14.7 | 4.3 | 12.2 | 3.6 | 8.8 |