--- license: openrail language: - da base_model: Qwen/Qwen3-ASR-1.7B tags: - automatic-speech-recognition - danish - qwen - asr - speech-to-text - coral - podcast - streaming datasets: - alexandrainst/coral library_name: transformers pipeline_tag: automatic-speech-recognition metrics: - wer - cer model-index: - name: milo-asr results: - task: type: automatic-speech-recognition name: Speech Recognition dataset: type: alexandrainst/coral name: CoRal v2 Test split: test metrics: - type: wer value: 23.24 name: WER - type: cer value: 11.17 name: CER --- # Milo-ASR: Dansk ASR Model **Milo-ASR** er en dansk speech-to-text model baseret på [Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B), finetuned til at forstå dansk - både oplæst tale og samtaler/podcasts. Modellen er trænet på [CoRal v2](https://huggingface.co/datasets/alexandrainst/coral) + danske podcast-data, så den klarer sig godt på tværs af domæner. De fleste andre modeller er kun gode til enten det ene eller det andet. ## Resultater ### CoRal v2 (oplæst tale, 10.370 samples) | Model | WER | CER | |-------|-----|-----| | hviske-v2 (Whisper v2) | **17.40%** | **7.96%** | | hviske-v3 (Whisper v3) | 21.62% | 9.22% | | **Milo-ASR** | 23.24% | 11.17% | | Whisper v3 Turbo | 40.35% | 15.51% | | Qwen3-ASR base | 46.28% | 19.78% | ### Podcast (samtaler, 500 samples) | Model | WER | CER | |-------|-----|-----| | **Milo-ASR** | **21.82%** | **15.64%** | | hviske-v2 (Whisper v2) | 50.67% | 38.31% | | Whisper v3 Turbo | 67.03% | 45.98% | | Qwen3-ASR base | 67.52% | 47.71% | | hviske-v3 (Whisper v3) | 67.65% | 50.12% | Milo-ASR er den eneste model der klarer begge domæner godt. På podcasts er den **2.3x bedre** end næstbedste model (hviske-v2). ### Plots ![CoRal v2 WER](plots/wer_comparison.png) ![Podcast WER](plots/wer_comparison_podcast.png) ![CoRal vs Podcast](plots/dual_domain_wer.png) ![Speed](plots/accuracy_vs_speed.png) --- ## Quick Start ```bash pip install qwen-asr transformers torch ``` ```python from qwen_asr import Qwen3ASRModel model = Qwen3ASRModel.from_pretrained( "pluttodk/Milo-ASR", dtype="bfloat16", device_map="cuda:0", ) results = model.transcribe( audio="path/to/danish_audio.wav", language="Danish", ) print(results[0].text) ``` ### Batch Transcription ```python audio_files = ["audio1.wav", "audio2.wav", "audio3.wav"] results = model.transcribe(audio=audio_files, language="Danish") for r in results: print(r.text) ``` ### Timestamps ```python model = Qwen3ASRModel.from_pretrained( "pluttodk/Milo-ASR", forced_aligner="Qwen/Qwen3-ForcedAligner-0.6B", dtype="bfloat16", device_map="cuda:0", ) results = model.transcribe( audio="path/to/audio.wav", language="Danish", return_time_stamps=True, ) for item in results[0].time_stamps.items: print(f"{item.start_time:.2f}s - {item.end_time:.2f}s: {item.text}") ``` ### Streaming (vLLM) ```python model = Qwen3ASRModel.LLM( model="pluttodk/Milo-ASR", gpu_memory_utilization=0.8, ) state = model.init_streaming_state(language="Danish", chunk_size_sec=2.0) for audio_chunk in audio_stream(): state = model.streaming_transcribe(audio_chunk, state) print(state.text) state = model.finish_streaming_transcribe(state) ``` --- ## Træningsdetaljer Modellen er finetuned i to stages: 1. **Stage 1**: Qwen3-ASR-1.7B finetuned på CoRal v2 (~250K samples, 3 epochs, lr=2e-5) 2. **Stage 2**: Fortsat fra stage 1 checkpoint på podcast + Azure podcast data (~141K samples, 8 epochs, lr=1e-5, cosine schedule) | Parameter | Stage 2 | |-----------|---------| | Learning rate | 1e-5 | | Batch size | 8 (x4 grad acc = 32 effective) | | Epochs | 8 | | LR scheduler | Cosine | | Warmup ratio | 0.1 | | Weight decay | 0.01 | | Precision | bfloat16 | | Training steps | 35,560 | --- ## Ting nedarvet fra Qwen3-ASR - Streaming/real-time via vLLM - Sang detection (baggrundsmusik) - Word-level timestamps - 30+ sprog (dansk optimeret) - Op til 20 min audio pr. request --- ## Citation ```bibtex @misc{Milo-ASR, author = {Rønnelund, Mathias Oliver Valdbjørn}, title = {Milo-ASR: Danish ASR Model based on Qwen3-ASR}, year = {2026}, publisher = {HuggingFace}, url = {https://huggingface.co/pluttodk/Milo-ASR} } ``` ## Acknowledgements - [Qwen Team](https://github.com/QwenLM) for Qwen3-ASR - [Alexandra Institute](https://alexandra.dk/) for CoRal v2