|
|
--- |
|
|
datasets: |
|
|
- leduckhai/MultiMed-ST |
|
|
- leduckhai/MultiMed |
|
|
language: |
|
|
- vi |
|
|
- en |
|
|
- de |
|
|
- fr |
|
|
- zh |
|
|
metrics: |
|
|
- wer |
|
|
- bleu |
|
|
- rouge |
|
|
- bertscore |
|
|
- ter |
|
|
base_model: |
|
|
- openai/whisper-small |
|
|
- facebook/m2m100_418M |
|
|
- meta-llama/Llama-3.1-8B |
|
|
pipeline_tag: translation |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
<p align="center"> |
|
|
<img src="./MultiMedST_icon.png" alt="MultiMedST_icon" width="70"> |
|
|
</p> |
|
|
|
|
|
<h1 align="center">MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation</h1> |
|
|
|
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://arxiv.org/abs/2504.03546"> |
|
|
<img src="https://img.shields.io/badge/Paper-arXiv%3A2504.03546-b31b1b?logo=arxiv&logoColor=white" alt="Paper"> |
|
|
</a> |
|
|
<a href="https://huggingface.co/datasets/leduckhai/MultiMed-ST"> |
|
|
<img src="https://img.shields.io/badge/Dataset-HuggingFace-blue?logo=huggingface&logoColor=white" alt="Dataset"> |
|
|
</a> |
|
|
<a href="https://huggingface.co/leduckhai/MultiMed-ST"> |
|
|
<img src="https://img.shields.io/badge/Models-HuggingFace-green?logo=huggingface&logoColor=white" alt="Models"> |
|
|
</a> |
|
|
<a href="https://github.com/leduckhai/MultiMed-ST/blob/main/LICENSE"> |
|
|
<img src="https://img.shields.io/badge/License-MIT-yellow" alt="License"> |
|
|
</a> |
|
|
<a href="https://github.com/leduckhai/MultiMed-ST/stargazers"> |
|
|
<img src="https://img.shields.io/github/stars/leduckhai/MultiMed-ST?style=social" alt="Stars"> |
|
|
</a> |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
<strong>π EMNLP 2025</strong> |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
<b>Khai Le-Duc*</b>, <b>Tuyen Tran*</b>, Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang, Hung-Phong Tran, Thanh-Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo, Nguyen X. Khanh**, Thanh Nguyen-Tang** |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
<sub>*Equal contribution | **Equal supervision</sub> |
|
|
</p> |
|
|
|
|
|
--- |
|
|
|
|
|
> β **If you find this work useful, please consider starring the repo and citing our paper!** |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Abstract |
|
|
|
|
|
Multilingual speech translation (ST) in the **medical domain** enhances patient care by enabling effective communication across language barriers, alleviating workforce shortages, and improving diagnosis and treatment β especially in global health emergencies. |
|
|
|
|
|
In this work, we introduce **MultiMed-ST**, the *first large-scale multilingual medical speech translation dataset*, spanning **all translation directions** across **five languages**: |
|
|
π»π³ Vietnamese, π¬π§ English, π©πͺ German, π«π· French, π¨π³ Traditional & Simplified Chinese. |
|
|
|
|
|
With **290,000 samples**, *MultiMed-ST* represents: |
|
|
- π§© the **largest medical MT dataset** to date |
|
|
- π the **largest many-to-many multilingual ST dataset** across all domains |
|
|
|
|
|
We also conduct **the most comprehensive ST analysis in the field's history**, to our best knowledge, covering: |
|
|
- β
Empirical baselines |
|
|
- π Bilingual vs. multilingual study |
|
|
- π§© End-to-end vs. cascaded models |
|
|
- π― Task-specific vs. multi-task seq2seq approaches |
|
|
- π£οΈ Code-switching analysis |
|
|
- π Quantitative & qualitative error analysis |
|
|
|
|
|
All **code, data, and models** are publicly available: π [**GitHub Repository**](https://github.com/leduckhai/MultiMed-ST) |
|
|
|
|
|
|
|
|
<p align="center"> |
|
|
<img src="./poster_MultiMed-ST_EMNLP2025.png" alt="poster_MultiMed-ST_EMNLP2025" width="85%"> |
|
|
</p> |
|
|
|
|
|
--- |
|
|
|
|
|
## π§° Repository Overview |
|
|
|
|
|
This repository provides scripts for: |
|
|
|
|
|
- ποΈ **Automatic Speech Recognition (ASR)** |
|
|
- π **Machine Translation (MT)** |
|
|
- π **Speech Translation (ST)** β both **cascaded** and **end-to-end** seq2seq models |
|
|
|
|
|
It includes: |
|
|
|
|
|
- βοΈ Model preparation & fine-tuning |
|
|
- π Training & inference scripts |
|
|
- π Evaluation & benchmarking utilities |
|
|
|
|
|
--- |
|
|
|
|
|
## π¦ Dataset & Models |
|
|
|
|
|
- **Dataset:** [π€ Hugging Face Dataset](https://huggingface.co/datasets/leduckhai/MultiMed-ST) |
|
|
- **Fine-tuned Models:** [π€ Hugging Face Models](https://huggingface.co/leduckhai/MultiMed-ST) |
|
|
|
|
|
You can explore and download all fine-tuned models for **MultiMed-ST** directly from our Hugging Face repository: |
|
|
|
|
|
<details> |
|
|
<summary><b>πΉ Whisper ASR Fine-tuned Models (Click to expand) </b></summary> |
|
|
|
|
|
| Language | Model Link | |
|
|
|-----------|------------| |
|
|
| Chinese | [whisper-small-chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-chinese) | |
|
|
| English | [whisper-small-english](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-english) | |
|
|
| French | [whisper-small-french](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-french) | |
|
|
| German | [whisper-small-german](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-german) | |
|
|
| Multilingual | [whisper-small-multilingual](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-multilingual) | |
|
|
| Vietnamese | [whisper-small-vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-vietnamese) | |
|
|
|
|
|
</details> |
|
|
|
|
|
<details> |
|
|
<summary><b>πΉ LLaMA-based MT Fine-tuned Models (Click to expand) </b></summary> |
|
|
|
|
|
| Source β Target | Model Link | |
|
|
|------------------|------------| |
|
|
| Chinese β English | [llama_Chinese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_English) | |
|
|
| Chinese β French | [llama_Chinese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_French) | |
|
|
| Chinese β German | [llama_Chinese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_German) | |
|
|
| Chinese β Vietnamese | [llama_Chinese_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_Vietnamese) | |
|
|
| English β Chinese | [llama_English_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Chinese) | |
|
|
| English β French | [llama_English_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_French) | |
|
|
| English β German | [llama_English_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_German) | |
|
|
| English β Vietnamese | [llama_English_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Vietnamese) | |
|
|
| French β Chinese | [llama_French_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Chinese) | |
|
|
| French β English | [llama_French_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_English) | |
|
|
| French β German | [llama_French_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_German) | |
|
|
| French β Vietnamese | [llama_French_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Vietnamese) | |
|
|
| German β Chinese | [llama_German_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Chinese) | |
|
|
| German β English | [llama_German_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_English) | |
|
|
| German β French | [llama_German_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_French) | |
|
|
| German β Vietnamese | [llama_German_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Vietnamese) | |
|
|
| Vietnamese β Chinese | [llama_Vietnamese_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_Chinese) | |
|
|
| Vietnamese β English | [llama_Vietnamese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_English) | |
|
|
| Vietnamese β French | [llama_Vietnamese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_French) | |
|
|
| Vietnamese β German | [llama_Vietnamese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_German) | |
|
|
|
|
|
</details> |
|
|
|
|
|
<details> |
|
|
<summary><b>πΉ m2m100_418M MT Fine-tuned Models (Click to expand) </b></summary> |
|
|
|
|
|
| Source β Target | Model Link | |
|
|
|------------------|------------| |
|
|
| de β en | [m2m100_418M-finetuned-de-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-en) | |
|
|
| de β fr | [m2m100_418M-finetuned-de-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-fr) | |
|
|
| de β vi | [m2m100_418M-finetuned-de-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-vi) | |
|
|
| de β zh | [m2m100_418M-finetuned-de-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-zh) | |
|
|
| en β de | [m2m100_418M-finetuned-en-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-de) | |
|
|
| en β fr | [m2m100_418M-finetuned-en-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-fr) | |
|
|
| en β vi | [m2m100_418M-finetuned-en-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-vi) | |
|
|
| en β zh | [m2m100_418M-finetuned-en-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-zh) | |
|
|
| fr β de | [m2m100_418M-finetuned-fr-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-de) | |
|
|
| fr β en | [m2m100_418M-finetuned-fr-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-en) | |
|
|
| fr β vi | [m2m100_418M-finetuned-fr-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-vi) | |
|
|
| fr β zh | [m2m100_418M-finetuned-fr-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-zh) | |
|
|
| vi β de | [m2m100_418M-finetuned-vi-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-de) | |
|
|
| vi β en | [m2m100_418M-finetuned-vi-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-en) | |
|
|
| vi β fr | [m2m100_418M-finetuned-vi-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-fr) | |
|
|
| vi β zh | [m2m100_418M-finetuned-vi-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-zh) | |
|
|
| zh β de | [m2m100_418M-finetuned-zh-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-de) | |
|
|
| zh β en | [m2m100_418M-finetuned-zh-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-en) | |
|
|
| zh β fr | [m2m100_418M-finetuned-zh-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-fr) | |
|
|
| zh β vi | [m2m100_418M-finetuned-zh-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-vi) | |
|
|
|
|
|
</details> |
|
|
|
|
|
--- |
|
|
|
|
|
## π¨βπ» Core Developers |
|
|
|
|
|
1. **Khai Le-Duc** |
|
|
|
|
|
University of Toronto, Canada |
|
|
|
|
|
π§ [[email protected]](mailto:[email protected]) |
|
|
π [https://github.com/leduckhai](https://github.com/leduckhai) |
|
|
|
|
|
2. **Tuyen Tran**: π§ [[email protected]](mailto:[email protected]) |
|
|
|
|
|
Hanoi University of Science and Technology, Vietnam |
|
|
|
|
|
3. **Nguyen Kim Hai Bui**: π§ [[email protected]](mailto:[email protected]) |
|
|
|
|
|
EΓΆtvΓΆs LorΓ‘nd University, Hungary |
|
|
|
|
|
## π§Ύ Citation |
|
|
|
|
|
If you use our dataset or models, please cite: |
|
|
|
|
|
π [arXiv:2504.03546](https://arxiv.org/abs/2504.03546) |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{le2025multimedst, |
|
|
title={MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation}, |
|
|
author={Le-Duc, Khai and Tran, Tuyen and Tat, Bach Phan and Bui, Nguyen Kim Hai and Anh, Quan Dang and Tran, Hung-Phong and Nguyen, Thanh Thuy and Nguyen, Ly and Phan, Tuan Minh and Tran, Thi Thu Phuong and others}, |
|
|
booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing}, |
|
|
pages={11838--11963}, |
|
|
year={2025} |
|
|
} |
|
|
|