--- license: mit datasets: - openslr/librispeech_asr language: - en pipeline_tag: automatic-speech-recognition --- # Splitformer
GitHub arXiv
## 1. Overview **Splitformer** is a 36.7M parameters Conformer-based ASR model trained from scratch on 1000 hours of the **LibriSpeech dataset** with an **early‐exit objective**. This architecture introduces **parallel downsampling layers** before the first and last exits to improve performance with minimal extra overhead, while retaining inference speed. Our code for training and inference is available on our [GitHub](https://github.com/augustgw/early-exit-transformer) repository. ### 2. Results on LibriSpeech
Layer EE-baseline (31.5M) Splitformer (36.7M) Wav2Vec2 (94.0M) WavLM (94.7M)
test-clean test-other test-clean test-other test-clean test-other test-clean test-other
2 31.0 51.0 28.1 48.3 33.7 56.0 28.0 48.5
4 11.7 27.8 10.8 26.4 17.4 36.7 13.9 27.3
6 7.1 19.8 6.7 19.2 9.6 23.7 8.7 18.4
8 5.8 16.6 5.5 16.3 5.8 15.9 4.8 12.4
10 5.3 15.3 5.1 15.1 4.5 12.6 4.0 9.5
12 5.1 14.8 4.8 14.7 4.3 12.2 3.6 8.8
## 3. Citation ```bibtex @misc{lasbordes2025splitformer, title={Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices}, author={Maxence Lasbordes, Daniele Falavigna and Alessio Brutti}, year={2025}, note={Proc. of EUSIPCO 2025}, }