--- license: mit datasets: - openslr/librispeech_asr language: - en pipeline_tag: automatic-speech-recognition --- # Splitformer

## 1. Overview **Splitformer** is a 36.7M parameters Conformer-based ASR model trained from scratch on 1000 hours of the **LibriSpeech dataset** with an **early‐exit objective**. This architecture introduces **parallel downsampling layers** before the first and last exits to improve performance with minimal extra overhead, while retaining inference speed. Our code for training and inference is available on our [GitHub](https://github.com/augustgw/early-exit-transformer) repository. ### 2. Results on LibriSpeech

Layer	EE-baseline (31.5M)		Splitformer (36.7M)		Wav2Vec2 (94.0M)		WavLM (94.7M)
Layer	test-clean	test-other	test-clean	test-other	test-clean	test-other	test-clean	test-other
2	31.0	51.0	28.1	48.3	33.7	56.0	28.0	48.5
4	11.7	27.8	10.8	26.4	17.4	36.7	13.9	27.3
6	7.1	19.8	6.7	19.2	9.6	23.7	8.7	18.4
8	5.8	16.6	5.5	16.3	5.8	15.9	4.8	12.4
10	5.3	15.3	5.1	15.1	4.5	12.6	4.0	9.5
12	5.1	14.8	4.8	14.7	4.3	12.2	3.6	8.8

## 3. Citation ```bibtex @misc{lasbordes2025splitformer, title={Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices}, author={Maxence Lasbordes, Daniele Falavigna and Alessio Brutti}, year={2025}, note={Proc. of EUSIPCO 2025}, }