MultiMed-ST / README.md
leduckhai's picture
Update README.md
fb15edd verified
---
datasets:
- leduckhai/MultiMed-ST
- leduckhai/MultiMed
language:
- vi
- en
- de
- fr
- zh
metrics:
- wer
- bleu
- rouge
- bertscore
- ter
base_model:
- openai/whisper-small
- facebook/m2m100_418M
- meta-llama/Llama-3.1-8B
pipeline_tag: translation
license: mit
---
<p align="center">
<img src="./MultiMedST_icon.png" alt="MultiMedST_icon" width="70">
</p>
<h1 align="center">MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation</h1>
<p align="center">
<a href="https://arxiv.org/abs/2504.03546">
<img src="https://img.shields.io/badge/Paper-arXiv%3A2504.03546-b31b1b?logo=arxiv&logoColor=white" alt="Paper">
</a>
<a href="https://huggingface.co/datasets/leduckhai/MultiMed-ST">
<img src="https://img.shields.io/badge/Dataset-HuggingFace-blue?logo=huggingface&logoColor=white" alt="Dataset">
</a>
<a href="https://huggingface.co/leduckhai/MultiMed-ST">
<img src="https://img.shields.io/badge/Models-HuggingFace-green?logo=huggingface&logoColor=white" alt="Models">
</a>
<a href="https://github.com/leduckhai/MultiMed-ST/blob/main/LICENSE">
<img src="https://img.shields.io/badge/License-MIT-yellow" alt="License">
</a>
<a href="https://github.com/leduckhai/MultiMed-ST/stargazers">
<img src="https://img.shields.io/github/stars/leduckhai/MultiMed-ST?style=social" alt="Stars">
</a>
</p>
<p align="center">
<strong>πŸ“˜ EMNLP 2025</strong>
</p>
<p align="center">
<b>Khai Le-Duc*</b>, <b>Tuyen Tran*</b>, Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang, Hung-Phong Tran, Thanh-Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo, Nguyen X. Khanh**, Thanh Nguyen-Tang**
</p>
<p align="center">
<sub>*Equal contribution &nbsp;&nbsp;|&nbsp;&nbsp; **Equal supervision</sub>
</p>
---
> ⭐ **If you find this work useful, please consider starring the repo and citing our paper!**
---
## 🧠 Abstract
Multilingual speech translation (ST) in the **medical domain** enhances patient care by enabling effective communication across language barriers, alleviating workforce shortages, and improving diagnosis and treatment β€” especially in global health emergencies.
In this work, we introduce **MultiMed-ST**, the *first large-scale multilingual medical speech translation dataset*, spanning **all translation directions** across **five languages**:
πŸ‡»πŸ‡³ Vietnamese, πŸ‡¬πŸ‡§ English, πŸ‡©πŸ‡ͺ German, πŸ‡«πŸ‡· French, πŸ‡¨πŸ‡³ Traditional & Simplified Chinese.
With **290,000 samples**, *MultiMed-ST* represents:
- 🧩 the **largest medical MT dataset** to date
- 🌐 the **largest many-to-many multilingual ST dataset** across all domains
We also conduct **the most comprehensive ST analysis in the field's history**, to our best knowledge, covering:
- βœ… Empirical baselines
- πŸ”„ Bilingual vs. multilingual study
- 🧩 End-to-end vs. cascaded models
- 🎯 Task-specific vs. multi-task seq2seq approaches
- πŸ—£οΈ Code-switching analysis
- πŸ“Š Quantitative & qualitative error analysis
All **code, data, and models** are publicly available: πŸ‘‰ [**GitHub Repository**](https://github.com/leduckhai/MultiMed-ST)
<p align="center">
<img src="./poster_MultiMed-ST_EMNLP2025.png" alt="poster_MultiMed-ST_EMNLP2025" width="85%">
</p>
---
## 🧰 Repository Overview
This repository provides scripts for:
- πŸŽ™οΈ **Automatic Speech Recognition (ASR)**
- 🌍 **Machine Translation (MT)**
- πŸ”„ **Speech Translation (ST)** β€” both **cascaded** and **end-to-end** seq2seq models
It includes:
- βš™οΈ Model preparation & fine-tuning
- πŸš€ Training & inference scripts
- πŸ“Š Evaluation & benchmarking utilities
---
## πŸ“¦ Dataset & Models
- **Dataset:** [πŸ€— Hugging Face Dataset](https://huggingface.co/datasets/leduckhai/MultiMed-ST)
- **Fine-tuned Models:** [πŸ€— Hugging Face Models](https://huggingface.co/leduckhai/MultiMed-ST)
You can explore and download all fine-tuned models for **MultiMed-ST** directly from our Hugging Face repository:
<details>
<summary><b>πŸ”Ή Whisper ASR Fine-tuned Models (Click to expand) </b></summary>
| Language | Model Link |
|-----------|------------|
| Chinese | [whisper-small-chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-chinese) |
| English | [whisper-small-english](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-english) |
| French | [whisper-small-french](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-french) |
| German | [whisper-small-german](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-german) |
| Multilingual | [whisper-small-multilingual](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-multilingual) |
| Vietnamese | [whisper-small-vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-vietnamese) |
</details>
<details>
<summary><b>πŸ”Ή LLaMA-based MT Fine-tuned Models (Click to expand) </b></summary>
| Source β†’ Target | Model Link |
|------------------|------------|
| Chinese β†’ English | [llama_Chinese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_English) |
| Chinese β†’ French | [llama_Chinese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_French) |
| Chinese β†’ German | [llama_Chinese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_German) |
| Chinese β†’ Vietnamese | [llama_Chinese_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_Vietnamese) |
| English β†’ Chinese | [llama_English_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Chinese) |
| English β†’ French | [llama_English_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_French) |
| English β†’ German | [llama_English_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_German) |
| English β†’ Vietnamese | [llama_English_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Vietnamese) |
| French β†’ Chinese | [llama_French_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Chinese) |
| French β†’ English | [llama_French_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_English) |
| French β†’ German | [llama_French_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_German) |
| French β†’ Vietnamese | [llama_French_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Vietnamese) |
| German β†’ Chinese | [llama_German_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Chinese) |
| German β†’ English | [llama_German_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_English) |
| German β†’ French | [llama_German_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_French) |
| German β†’ Vietnamese | [llama_German_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Vietnamese) |
| Vietnamese β†’ Chinese | [llama_Vietnamese_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_Chinese) |
| Vietnamese β†’ English | [llama_Vietnamese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_English) |
| Vietnamese β†’ French | [llama_Vietnamese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_French) |
| Vietnamese β†’ German | [llama_Vietnamese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_German) |
</details>
<details>
<summary><b>πŸ”Ή m2m100_418M MT Fine-tuned Models (Click to expand) </b></summary>
| Source β†’ Target | Model Link |
|------------------|------------|
| de β†’ en | [m2m100_418M-finetuned-de-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-en) |
| de β†’ fr | [m2m100_418M-finetuned-de-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-fr) |
| de β†’ vi | [m2m100_418M-finetuned-de-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-vi) |
| de β†’ zh | [m2m100_418M-finetuned-de-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-zh) |
| en β†’ de | [m2m100_418M-finetuned-en-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-de) |
| en β†’ fr | [m2m100_418M-finetuned-en-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-fr) |
| en β†’ vi | [m2m100_418M-finetuned-en-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-vi) |
| en β†’ zh | [m2m100_418M-finetuned-en-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-zh) |
| fr β†’ de | [m2m100_418M-finetuned-fr-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-de) |
| fr β†’ en | [m2m100_418M-finetuned-fr-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-en) |
| fr β†’ vi | [m2m100_418M-finetuned-fr-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-vi) |
| fr β†’ zh | [m2m100_418M-finetuned-fr-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-zh) |
| vi β†’ de | [m2m100_418M-finetuned-vi-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-de) |
| vi β†’ en | [m2m100_418M-finetuned-vi-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-en) |
| vi β†’ fr | [m2m100_418M-finetuned-vi-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-fr) |
| vi β†’ zh | [m2m100_418M-finetuned-vi-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-zh) |
| zh β†’ de | [m2m100_418M-finetuned-zh-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-de) |
| zh β†’ en | [m2m100_418M-finetuned-zh-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-en) |
| zh β†’ fr | [m2m100_418M-finetuned-zh-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-fr) |
| zh β†’ vi | [m2m100_418M-finetuned-zh-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-vi) |
</details>
---
## πŸ‘¨β€πŸ’» Core Developers
1. **Khai Le-Duc**
University of Toronto, Canada
πŸ“§ [[email protected]](mailto:[email protected])
πŸ”— [https://github.com/leduckhai](https://github.com/leduckhai)
2. **Tuyen Tran**: πŸ“§ [[email protected]](mailto:[email protected])
Hanoi University of Science and Technology, Vietnam
3. **Nguyen Kim Hai Bui**: πŸ“§ [[email protected]](mailto:[email protected])
EΓΆtvΓΆs LorΓ‘nd University, Hungary
## 🧾 Citation
If you use our dataset or models, please cite:
πŸ“„ [arXiv:2504.03546](https://arxiv.org/abs/2504.03546)
```bibtex
@inproceedings{le2025multimedst,
title={MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation},
author={Le-Duc, Khai and Tran, Tuyen and Tat, Bach Phan and Bui, Nguyen Kim Hai and Anh, Quan Dang and Tran, Hung-Phong and Nguyen, Thanh Thuy and Nguyen, Ly and Phan, Tuan Minh and Tran, Thi Thu Phuong and others},
booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
pages={11838--11963},
year={2025}
}