--- datasets: - leduckhai/MultiMed-ST - leduckhai/MultiMed language: - vi - en - de - fr - zh metrics: - wer - bleu - rouge - bertscore - ter base_model: - openai/whisper-small - facebook/m2m100_418M - meta-llama/Llama-3.1-8B pipeline_tag: translation license: mit ---

MultiMedST_icon

MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation

Paper Dataset Models License Stars

πŸ“˜ EMNLP 2025

Khai Le-Duc*, Tuyen Tran*, Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang, Hung-Phong Tran, Thanh-Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo, Nguyen X. Khanh**, Thanh Nguyen-Tang**

*Equal contribution   |   **Equal supervision

--- > ⭐ **If you find this work useful, please consider starring the repo and citing our paper!** --- ## 🧠 Abstract Multilingual speech translation (ST) in the **medical domain** enhances patient care by enabling effective communication across language barriers, alleviating workforce shortages, and improving diagnosis and treatment β€” especially in global health emergencies. In this work, we introduce **MultiMed-ST**, the *first large-scale multilingual medical speech translation dataset*, spanning **all translation directions** across **five languages**: πŸ‡»πŸ‡³ Vietnamese, πŸ‡¬πŸ‡§ English, πŸ‡©πŸ‡ͺ German, πŸ‡«πŸ‡· French, πŸ‡¨πŸ‡³ Traditional & Simplified Chinese. With **290,000 samples**, *MultiMed-ST* represents: - 🧩 the **largest medical MT dataset** to date - 🌐 the **largest many-to-many multilingual ST dataset** across all domains We also conduct **the most comprehensive ST analysis in the field's history**, to our best knowledge, covering: - βœ… Empirical baselines - πŸ”„ Bilingual vs. multilingual study - 🧩 End-to-end vs. cascaded models - 🎯 Task-specific vs. multi-task seq2seq approaches - πŸ—£οΈ Code-switching analysis - πŸ“Š Quantitative & qualitative error analysis All **code, data, and models** are publicly available: πŸ‘‰ [**GitHub Repository**](https://github.com/leduckhai/MultiMed-ST)

poster_MultiMed-ST_EMNLP2025

--- ## 🧰 Repository Overview This repository provides scripts for: - πŸŽ™οΈ **Automatic Speech Recognition (ASR)** - 🌍 **Machine Translation (MT)** - πŸ”„ **Speech Translation (ST)** β€” both **cascaded** and **end-to-end** seq2seq models It includes: - βš™οΈ Model preparation & fine-tuning - πŸš€ Training & inference scripts - πŸ“Š Evaluation & benchmarking utilities --- ## πŸ“¦ Dataset & Models - **Dataset:** [πŸ€— Hugging Face Dataset](https://huggingface.co/datasets/leduckhai/MultiMed-ST) - **Fine-tuned Models:** [πŸ€— Hugging Face Models](https://huggingface.co/leduckhai/MultiMed-ST) You can explore and download all fine-tuned models for **MultiMed-ST** directly from our Hugging Face repository:
πŸ”Ή Whisper ASR Fine-tuned Models (Click to expand) | Language | Model Link | |-----------|------------| | Chinese | [whisper-small-chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-chinese) | | English | [whisper-small-english](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-english) | | French | [whisper-small-french](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-french) | | German | [whisper-small-german](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-german) | | Multilingual | [whisper-small-multilingual](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-multilingual) | | Vietnamese | [whisper-small-vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-vietnamese) |
πŸ”Ή LLaMA-based MT Fine-tuned Models (Click to expand) | Source β†’ Target | Model Link | |------------------|------------| | Chinese β†’ English | [llama_Chinese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_English) | | Chinese β†’ French | [llama_Chinese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_French) | | Chinese β†’ German | [llama_Chinese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_German) | | Chinese β†’ Vietnamese | [llama_Chinese_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_Vietnamese) | | English β†’ Chinese | [llama_English_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Chinese) | | English β†’ French | [llama_English_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_French) | | English β†’ German | [llama_English_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_German) | | English β†’ Vietnamese | [llama_English_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Vietnamese) | | French β†’ Chinese | [llama_French_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Chinese) | | French β†’ English | [llama_French_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_English) | | French β†’ German | [llama_French_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_German) | | French β†’ Vietnamese | [llama_French_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Vietnamese) | | German β†’ Chinese | [llama_German_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Chinese) | | German β†’ English | [llama_German_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_English) | | German β†’ French | [llama_German_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_French) | | German β†’ Vietnamese | [llama_German_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Vietnamese) | | Vietnamese β†’ Chinese | [llama_Vietnamese_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_Chinese) | | Vietnamese β†’ English | [llama_Vietnamese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_English) | | Vietnamese β†’ French | [llama_Vietnamese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_French) | | Vietnamese β†’ German | [llama_Vietnamese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_German) |
πŸ”Ή m2m100_418M MT Fine-tuned Models (Click to expand) | Source β†’ Target | Model Link | |------------------|------------| | de β†’ en | [m2m100_418M-finetuned-de-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-en) | | de β†’ fr | [m2m100_418M-finetuned-de-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-fr) | | de β†’ vi | [m2m100_418M-finetuned-de-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-vi) | | de β†’ zh | [m2m100_418M-finetuned-de-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-zh) | | en β†’ de | [m2m100_418M-finetuned-en-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-de) | | en β†’ fr | [m2m100_418M-finetuned-en-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-fr) | | en β†’ vi | [m2m100_418M-finetuned-en-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-vi) | | en β†’ zh | [m2m100_418M-finetuned-en-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-zh) | | fr β†’ de | [m2m100_418M-finetuned-fr-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-de) | | fr β†’ en | [m2m100_418M-finetuned-fr-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-en) | | fr β†’ vi | [m2m100_418M-finetuned-fr-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-vi) | | fr β†’ zh | [m2m100_418M-finetuned-fr-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-zh) | | vi β†’ de | [m2m100_418M-finetuned-vi-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-de) | | vi β†’ en | [m2m100_418M-finetuned-vi-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-en) | | vi β†’ fr | [m2m100_418M-finetuned-vi-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-fr) | | vi β†’ zh | [m2m100_418M-finetuned-vi-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-zh) | | zh β†’ de | [m2m100_418M-finetuned-zh-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-de) | | zh β†’ en | [m2m100_418M-finetuned-zh-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-en) | | zh β†’ fr | [m2m100_418M-finetuned-zh-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-fr) | | zh β†’ vi | [m2m100_418M-finetuned-zh-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-vi) |
--- ## πŸ‘¨β€πŸ’» Core Developers 1. **Khai Le-Duc** University of Toronto, Canada πŸ“§ [duckhai.le@mail.utoronto.ca](mailto:duckhai.le@mail.utoronto.ca) πŸ”— [https://github.com/leduckhai](https://github.com/leduckhai) 2. **Tuyen Tran**: πŸ“§ [tuyencbt@gmail.com](mailto:tuyencbt@gmail.com) Hanoi University of Science and Technology, Vietnam 3. **Nguyen Kim Hai Bui**: πŸ“§ [htlulem185@gmail.com](mailto:htlulem185@gmail.com) EΓΆtvΓΆs LorΓ‘nd University, Hungary ## 🧾 Citation If you use our dataset or models, please cite: πŸ“„ [arXiv:2504.03546](https://arxiv.org/abs/2504.03546) ```bibtex @inproceedings{le2025multimedst, title={MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation}, author={Le-Duc, Khai and Tran, Tuyen and Tat, Bach Phan and Bui, Nguyen Kim Hai and Anh, Quan Dang and Tran, Hung-Phong and Nguyen, Thanh Thuy and Nguyen, Ly and Phan, Tuan Minh and Tran, Thi Thu Phuong and others}, booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing}, pages={11838--11963}, year={2025} }