--- tags: - speech-recognition - audio - chunkformer - ctc - pytorch - transformers - automatic-speech-recognition - long-form transcription - asr license: apache-2.0 library_name: transformers pipeline_tag: automatic-speech-recognition --- # ChunkFormer Model [![GitHub](https://img.shields.io/badge/GitHub-ChunkFormer-blue)](https://github.com/khanld/chunkformer) [![Paper](https://img.shields.io/badge/Paper-ICASSP%202025-green)](https://arxiv.org/abs/2502.14673) ## Usage Install the package: ```bash pip install chunkformer ``` ```python from chunkformer import ChunkFormerModel # Load the model model = ChunkFormerModel.from_pretrained("khanhld/chunkFormer-ctc-small-libri-960h") # For long-form audio transcription transcription = model.endless_decode( audio_path="path/to/your/audio.wav", chunk_size=64, left_context_size=128, right_context_size=128, return_timestamps=True ) print(transcription) # For batch processing audio_files = ["audio1.wav", "audio2.wav", "audio3.wav"] transcriptions = model.batch_decode( audio_paths=audio_files, chunk_size=64, left_context_size=128, right_context_size=128 ) ``` ## Training This model was trained using the ChunkFormer framework. For more details about the training process and to access the source code, please visit: https://github.com/khanld/chunkformer Paper: https://arxiv.org/abs/2502.14673 ## Citation If you use this work in your research, please cite: ```bibtex @INPROCEEDINGS{10888640, author={Le, Khanh and Ho, Tuan Vu and Tran, Dung and Chau, Duc Thanh}, booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription}, year={2025}, volume={}, number={}, pages={1-5}, keywords={Scalability;Memory management;Graphics processing units;Signal processing;Performance gain;Hardware;Resource management;Speech processing;Standards;Context modeling;chunkformer;masked batch;long-form transcription}, doi={10.1109/ICASSP49660.2025.10888640}} ```