File size: 11,661 Bytes
5c39657
 
 
48c90cc
5c39657
 
 
 
 
 
 
 
 
 
 
 
 
 
5fd3405
48c90cc
5c39657
59823a2
5c39657
 
59823a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fb15edd
59823a2
 
 
 
 
 
 
 
 
 
 
fb15edd
59823a2
fb15edd
59823a2
5c39657
59823a2
5c39657
59823a2
5c39657
59823a2
5c39657
59823a2
 
5fd3405
59823a2
 
 
5c39657
59823a2
 
 
 
 
 
 
5fd3405
59823a2
5fd3405
5c39657
59823a2
 
 
5fd3405
59823a2
5c39657
59823a2
5c39657
59823a2
5c39657
59823a2
 
 
5c39657
59823a2
5c39657
59823a2
 
 
5c39657
59823a2
5c39657
59823a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fb15edd
59823a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fb15edd
59823a2
fb15edd
59823a2
fb15edd
59823a2
fb15edd
59823a2
fb15edd
59823a2
fb15edd
59823a2
 
fb15edd
59823a2
fb15edd
5c39657
fb15edd
59823a2
fb15edd
59823a2
fb15edd
59823a2
fb15edd
59823a2
fb15edd
59823a2
fb15edd
59823a2
 
 
 
 
 
 
fb15edd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
---
datasets:
- leduckhai/MultiMed-ST
- leduckhai/MultiMed
language:
- vi
- en
- de
- fr
- zh
metrics:
- wer
- bleu
- rouge
- bertscore
- ter
base_model:
- openai/whisper-small
- facebook/m2m100_418M
- meta-llama/Llama-3.1-8B
pipeline_tag: translation
license: mit
---

<p align="center">
  <img src="./MultiMedST_icon.png" alt="MultiMedST_icon" width="70">
</p>

<h1 align="center">MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation</h1>


<p align="center">
  <a href="https://arxiv.org/abs/2504.03546">
    <img src="https://img.shields.io/badge/Paper-arXiv%3A2504.03546-b31b1b?logo=arxiv&logoColor=white" alt="Paper">
  </a>
  <a href="https://huggingface.co/datasets/leduckhai/MultiMed-ST">
    <img src="https://img.shields.io/badge/Dataset-HuggingFace-blue?logo=huggingface&logoColor=white" alt="Dataset">
  </a>
  <a href="https://huggingface.co/leduckhai/MultiMed-ST">
    <img src="https://img.shields.io/badge/Models-HuggingFace-green?logo=huggingface&logoColor=white" alt="Models">
  </a>
  <a href="https://github.com/leduckhai/MultiMed-ST/blob/main/LICENSE">
    <img src="https://img.shields.io/badge/License-MIT-yellow" alt="License">
  </a>
  <a href="https://github.com/leduckhai/MultiMed-ST/stargazers">
    <img src="https://img.shields.io/github/stars/leduckhai/MultiMed-ST?style=social" alt="Stars">
  </a>
</p>

<p align="center">
  <strong>πŸ“˜ EMNLP 2025</strong>
</p>

<p align="center">
  <b>Khai Le-Duc*</b>, <b>Tuyen Tran*</b>, Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang, Hung-Phong Tran, Thanh-Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo, Nguyen X. Khanh**, Thanh Nguyen-Tang**
</p>

<p align="center">
  <sub>*Equal contribution &nbsp;&nbsp;|&nbsp;&nbsp; **Equal supervision</sub>
</p>

---

> ⭐ **If you find this work useful, please consider starring the repo and citing our paper!**

---

## 🧠 Abstract

Multilingual speech translation (ST) in the **medical domain** enhances patient care by enabling effective communication across language barriers, alleviating workforce shortages, and improving diagnosis and treatment β€” especially in global health emergencies.

In this work, we introduce **MultiMed-ST**, the *first large-scale multilingual medical speech translation dataset*, spanning **all translation directions** across **five languages**:  
πŸ‡»πŸ‡³ Vietnamese, πŸ‡¬πŸ‡§ English, πŸ‡©πŸ‡ͺ German, πŸ‡«πŸ‡· French, πŸ‡¨πŸ‡³ Traditional & Simplified Chinese.

With **290,000 samples**, *MultiMed-ST* represents:
- 🧩 the **largest medical MT dataset** to date  
- 🌐 the **largest many-to-many multilingual ST dataset** across all domains  

We also conduct **the most comprehensive ST analysis in the field's history**, to our best knowledge, covering:
- βœ… Empirical baselines  
- πŸ”„ Bilingual vs. multilingual study  
- 🧩 End-to-end vs. cascaded models  
- 🎯 Task-specific vs. multi-task seq2seq approaches  
- πŸ—£οΈ Code-switching analysis  
- πŸ“Š Quantitative & qualitative error analysis  

All **code, data, and models** are publicly available:  πŸ‘‰ [**GitHub Repository**](https://github.com/leduckhai/MultiMed-ST)


<p align="center">
  <img src="./poster_MultiMed-ST_EMNLP2025.png" alt="poster_MultiMed-ST_EMNLP2025" width="85%">
</p>

---

## 🧰 Repository Overview

This repository provides scripts for:

- πŸŽ™οΈ **Automatic Speech Recognition (ASR)**
- 🌍 **Machine Translation (MT)**
- πŸ”„ **Speech Translation (ST)** β€” both **cascaded** and **end-to-end** seq2seq models  

It includes:

- βš™οΈ Model preparation & fine-tuning  
- πŸš€ Training & inference scripts  
- πŸ“Š Evaluation & benchmarking utilities  

---

## πŸ“¦ Dataset & Models

- **Dataset:** [πŸ€— Hugging Face Dataset](https://huggingface.co/datasets/leduckhai/MultiMed-ST)  
- **Fine-tuned Models:** [πŸ€— Hugging Face Models](https://huggingface.co/leduckhai/MultiMed-ST)

You can explore and download all fine-tuned models for **MultiMed-ST** directly from our Hugging Face repository:  

<details>
<summary><b>πŸ”Ή Whisper ASR Fine-tuned Models (Click to expand) </b></summary>

| Language | Model Link |
|-----------|------------|
| Chinese | [whisper-small-chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-chinese) |
| English | [whisper-small-english](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-english) |
| French | [whisper-small-french](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-french) |
| German | [whisper-small-german](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-german) |
| Multilingual | [whisper-small-multilingual](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-multilingual) |
| Vietnamese | [whisper-small-vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-vietnamese) |

</details>

<details>
<summary><b>πŸ”Ή LLaMA-based MT Fine-tuned Models (Click to expand) </b></summary>

| Source β†’ Target | Model Link |
|------------------|------------|
| Chinese β†’ English | [llama_Chinese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_English) |
| Chinese β†’ French | [llama_Chinese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_French) |
| Chinese β†’ German | [llama_Chinese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_German) |
| Chinese β†’ Vietnamese | [llama_Chinese_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_Vietnamese) |
| English β†’ Chinese | [llama_English_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Chinese) |
| English β†’ French | [llama_English_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_French) |
| English β†’ German | [llama_English_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_German) |
| English β†’ Vietnamese | [llama_English_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Vietnamese) |
| French β†’ Chinese | [llama_French_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Chinese) |
| French β†’ English | [llama_French_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_English) |
| French β†’ German | [llama_French_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_German) |
| French β†’ Vietnamese | [llama_French_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Vietnamese) |
| German β†’ Chinese | [llama_German_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Chinese) |
| German β†’ English | [llama_German_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_English) |
| German β†’ French | [llama_German_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_French) |
| German β†’ Vietnamese | [llama_German_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Vietnamese) |
| Vietnamese β†’ Chinese | [llama_Vietnamese_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_Chinese) |
| Vietnamese β†’ English | [llama_Vietnamese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_English) |
| Vietnamese β†’ French | [llama_Vietnamese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_French) |
| Vietnamese β†’ German | [llama_Vietnamese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_German) |

</details>

<details>
<summary><b>πŸ”Ή m2m100_418M MT Fine-tuned Models (Click to expand) </b></summary>

| Source β†’ Target | Model Link |
|------------------|------------|
| de β†’ en | [m2m100_418M-finetuned-de-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-en) |
| de β†’ fr | [m2m100_418M-finetuned-de-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-fr) |
| de β†’ vi | [m2m100_418M-finetuned-de-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-vi) |
| de β†’ zh | [m2m100_418M-finetuned-de-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-zh) |
| en β†’ de | [m2m100_418M-finetuned-en-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-de) |
| en β†’ fr | [m2m100_418M-finetuned-en-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-fr) |
| en β†’ vi | [m2m100_418M-finetuned-en-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-vi) |
| en β†’ zh | [m2m100_418M-finetuned-en-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-zh) |
| fr β†’ de | [m2m100_418M-finetuned-fr-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-de) |
| fr β†’ en | [m2m100_418M-finetuned-fr-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-en) |
| fr β†’ vi | [m2m100_418M-finetuned-fr-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-vi) |
| fr β†’ zh | [m2m100_418M-finetuned-fr-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-zh) |
| vi β†’ de | [m2m100_418M-finetuned-vi-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-de) |
| vi β†’ en | [m2m100_418M-finetuned-vi-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-en) |
| vi β†’ fr | [m2m100_418M-finetuned-vi-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-fr) |
| vi β†’ zh | [m2m100_418M-finetuned-vi-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-zh) |
| zh β†’ de | [m2m100_418M-finetuned-zh-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-de) |
| zh β†’ en | [m2m100_418M-finetuned-zh-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-en) |
| zh β†’ fr | [m2m100_418M-finetuned-zh-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-fr) |
| zh β†’ vi | [m2m100_418M-finetuned-zh-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-vi) |

</details>

---

## πŸ‘¨β€πŸ’» Core Developers

1. **Khai Le-Duc**  

University of Toronto, Canada 

πŸ“§ [[email protected]](mailto:[email protected])  
πŸ”— [https://github.com/leduckhai](https://github.com/leduckhai)

2. **Tuyen Tran**: πŸ“§ [[email protected]](mailto:[email protected]) 

Hanoi University of Science and Technology, Vietnam

3. **Nguyen Kim Hai Bui**: πŸ“§ [[email protected]](mailto:[email protected])  

EΓΆtvΓΆs LorΓ‘nd University, Hungary 

## 🧾 Citation

If you use our dataset or models, please cite:

πŸ“„ [arXiv:2504.03546](https://arxiv.org/abs/2504.03546)

```bibtex
@inproceedings{le2025multimedst,
  title={MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation},
  author={Le-Duc, Khai and Tran, Tuyen and Tat, Bach Phan and Bui, Nguyen Kim Hai and Anh, Quan Dang and Tran, Hung-Phong and Nguyen, Thanh Thuy and Nguyen, Ly and Phan, Tuan Minh and Tran, Thi Thu Phuong and others},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  pages={11838--11963},
  year={2025}
}