---
datasets:
- leduckhai/MultiMed-ST
- leduckhai/MultiMed
language:
- vi
- en
- de
- fr
- zh
metrics:
- wer
- bleu
- rouge
- bertscore
- ter
base_model:
- openai/whisper-small
- facebook/m2m100_418M
- meta-llama/Llama-3.1-8B
pipeline_tag: translation
license: mit
---
MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation
π EMNLP 2025
Khai Le-Duc*, Tuyen Tran*, Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang, Hung-Phong Tran, Thanh-Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo, Nguyen X. Khanh**, Thanh Nguyen-Tang**
*Equal contribution | **Equal supervision
---
> β **If you find this work useful, please consider starring the repo and citing our paper!**
---
## π§ Abstract
Multilingual speech translation (ST) in the **medical domain** enhances patient care by enabling effective communication across language barriers, alleviating workforce shortages, and improving diagnosis and treatment β especially in global health emergencies.
In this work, we introduce **MultiMed-ST**, the *first large-scale multilingual medical speech translation dataset*, spanning **all translation directions** across **five languages**:
π»π³ Vietnamese, π¬π§ English, π©πͺ German, π«π· French, π¨π³ Traditional & Simplified Chinese.
With **290,000 samples**, *MultiMed-ST* represents:
- π§© the **largest medical MT dataset** to date
- π the **largest many-to-many multilingual ST dataset** across all domains
We also conduct **the most comprehensive ST analysis in the field's history**, to our best knowledge, covering:
- β
Empirical baselines
- π Bilingual vs. multilingual study
- π§© End-to-end vs. cascaded models
- π― Task-specific vs. multi-task seq2seq approaches
- π£οΈ Code-switching analysis
- π Quantitative & qualitative error analysis
All **code, data, and models** are publicly available: π [**GitHub Repository**](https://github.com/leduckhai/MultiMed-ST)
---
## π§° Repository Overview
This repository provides scripts for:
- ποΈ **Automatic Speech Recognition (ASR)**
- π **Machine Translation (MT)**
- π **Speech Translation (ST)** β both **cascaded** and **end-to-end** seq2seq models
It includes:
- βοΈ Model preparation & fine-tuning
- π Training & inference scripts
- π Evaluation & benchmarking utilities
---
## π¦ Dataset & Models
- **Dataset:** [π€ Hugging Face Dataset](https://huggingface.co/datasets/leduckhai/MultiMed-ST)
- **Fine-tuned Models:** [π€ Hugging Face Models](https://huggingface.co/leduckhai/MultiMed-ST)
You can explore and download all fine-tuned models for **MultiMed-ST** directly from our Hugging Face repository:
πΉ Whisper ASR Fine-tuned Models (Click to expand)
| Language | Model Link |
|-----------|------------|
| Chinese | [whisper-small-chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-chinese) |
| English | [whisper-small-english](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-english) |
| French | [whisper-small-french](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-french) |
| German | [whisper-small-german](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-german) |
| Multilingual | [whisper-small-multilingual](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-multilingual) |
| Vietnamese | [whisper-small-vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-vietnamese) |
πΉ LLaMA-based MT Fine-tuned Models (Click to expand)
| Source β Target | Model Link |
|------------------|------------|
| Chinese β English | [llama_Chinese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_English) |
| Chinese β French | [llama_Chinese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_French) |
| Chinese β German | [llama_Chinese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_German) |
| Chinese β Vietnamese | [llama_Chinese_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_Vietnamese) |
| English β Chinese | [llama_English_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Chinese) |
| English β French | [llama_English_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_French) |
| English β German | [llama_English_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_German) |
| English β Vietnamese | [llama_English_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Vietnamese) |
| French β Chinese | [llama_French_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Chinese) |
| French β English | [llama_French_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_English) |
| French β German | [llama_French_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_German) |
| French β Vietnamese | [llama_French_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Vietnamese) |
| German β Chinese | [llama_German_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Chinese) |
| German β English | [llama_German_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_English) |
| German β French | [llama_German_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_French) |
| German β Vietnamese | [llama_German_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Vietnamese) |
| Vietnamese β Chinese | [llama_Vietnamese_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_Chinese) |
| Vietnamese β English | [llama_Vietnamese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_English) |
| Vietnamese β French | [llama_Vietnamese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_French) |
| Vietnamese β German | [llama_Vietnamese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_German) |
πΉ m2m100_418M MT Fine-tuned Models (Click to expand)
| Source β Target | Model Link |
|------------------|------------|
| de β en | [m2m100_418M-finetuned-de-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-en) |
| de β fr | [m2m100_418M-finetuned-de-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-fr) |
| de β vi | [m2m100_418M-finetuned-de-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-vi) |
| de β zh | [m2m100_418M-finetuned-de-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-zh) |
| en β de | [m2m100_418M-finetuned-en-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-de) |
| en β fr | [m2m100_418M-finetuned-en-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-fr) |
| en β vi | [m2m100_418M-finetuned-en-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-vi) |
| en β zh | [m2m100_418M-finetuned-en-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-zh) |
| fr β de | [m2m100_418M-finetuned-fr-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-de) |
| fr β en | [m2m100_418M-finetuned-fr-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-en) |
| fr β vi | [m2m100_418M-finetuned-fr-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-vi) |
| fr β zh | [m2m100_418M-finetuned-fr-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-zh) |
| vi β de | [m2m100_418M-finetuned-vi-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-de) |
| vi β en | [m2m100_418M-finetuned-vi-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-en) |
| vi β fr | [m2m100_418M-finetuned-vi-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-fr) |
| vi β zh | [m2m100_418M-finetuned-vi-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-zh) |
| zh β de | [m2m100_418M-finetuned-zh-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-de) |
| zh β en | [m2m100_418M-finetuned-zh-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-en) |
| zh β fr | [m2m100_418M-finetuned-zh-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-fr) |
| zh β vi | [m2m100_418M-finetuned-zh-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-vi) |
---
## π¨βπ» Core Developers
1. **Khai Le-Duc**
University of Toronto, Canada
π§ [duckhai.le@mail.utoronto.ca](mailto:duckhai.le@mail.utoronto.ca)
π [https://github.com/leduckhai](https://github.com/leduckhai)
2. **Tuyen Tran**: π§ [tuyencbt@gmail.com](mailto:tuyencbt@gmail.com)
Hanoi University of Science and Technology, Vietnam
3. **Nguyen Kim Hai Bui**: π§ [htlulem185@gmail.com](mailto:htlulem185@gmail.com)
EΓΆtvΓΆs LorΓ‘nd University, Hungary
## π§Ύ Citation
If you use our dataset or models, please cite:
π [arXiv:2504.03546](https://arxiv.org/abs/2504.03546)
```bibtex
@inproceedings{le2025multimedst,
title={MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation},
author={Le-Duc, Khai and Tran, Tuyen and Tat, Bach Phan and Bui, Nguyen Kim Hai and Anh, Quan Dang and Tran, Hung-Phong and Nguyen, Thanh Thuy and Nguyen, Ly and Phan, Tuan Minh and Tran, Thi Thu Phuong and others},
booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
pages={11838--11963},
year={2025}
}