leduckhai commited on
Commit
59823a2
Β·
verified Β·
1 Parent(s): b8c8cd2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +162 -43
README.md CHANGED
@@ -19,66 +19,185 @@ base_model:
19
  - facebook/m2m100_418M
20
  - meta-llama/Llama-3.1-8B
21
  pipeline_tag: translation
 
22
  ---
23
 
24
- # MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- **<div align="center">Preprint</div>**
27
 
28
- <div align="center">Khai Le-Duc*, Tuyen Tran*,</div>
29
- <div align="center">Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang, Hung-Phong Tran, Thanh-Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo,</div>
30
- <div align="center">Nguyen X. Khanh**, Thanh Nguyen-Tang**</div>
31
 
 
32
 
33
- <div align="center">*Equal contribution</div>
34
- <div align="center">**Equal supervision</div>
35
 
36
- * **Abstract:**
37
- Multilingual speech translation (ST) in the medical domain enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we present the first systematic study on medical ST, to our best knowledge, by releasing *MultiMed-ST*, a large-scale ST dataset for the medical domain, spanning all translation directions in five languages: Vietnamese, English, German, French, Traditional Chinese and Simplified Chinese, together with the models. With 290,000 samples, our dataset is the largest medical machine translation (MT) dataset and the largest many-to-many multilingual ST among all domains. Secondly, we present the most extensive analysis study in ST research to date, including: empirical baselines, bilingual-multilingual comparative study, end-to-end vs. cascaded comparative study, task-specific vs. multi-task sequence-to-sequence (seq2seq) comparative study, code-switch analysis, and quantitative-qualitative error analysis. All code, data, and models are available online: [https://github.com/leduckhai/MultiMed-ST](https://github.com/leduckhai/MultiMed-ST).
 
38
 
39
- > Please press ⭐ button and/or cite papers if you feel helpful.
 
 
 
 
 
 
40
 
41
- * **GitHub:**
42
- [https://github.com/leduckhai/MultiMed-ST](https://github.com/leduckhai/MultiMed-ST)
43
 
44
- * **Citation:**
45
- Please cite this paper: [https://arxiv.org/abs/2504.03546](https://arxiv.org/abs/2504.03546)
46
 
 
 
 
47
 
48
- ``` bibtex
49
- @article{le2025multimedst,
50
- title={MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation},
51
- author={Le-Duc, Khai and Tran, Tuyen and Tat, Bach Phan and Bui, Nguyen Kim Hai and Dang, Quan and Tran, Hung-Phong and Nguyen, Thanh-Thuy and Nguyen, Ly and Phan, Tuan-Minh and Tran, Thi Thu Phuong and others},
52
- journal={arXiv preprint arXiv:2504.03546},
53
- year={2025}
54
- }
55
- ```
56
 
57
- ## Dataset and Models:
58
 
59
- Dataset: [HuggingFace dataset](https://huggingface.co/datasets/leduckhai/MultiMed-ST)
60
 
61
- Fine-tuned models: [HuggingFace models](https://huggingface.co/leduckhai/MultiMed-ST)
 
 
62
 
63
- ## Contact:
64
 
65
- Core developers:
 
 
66
 
67
- **Khai Le-Duc**
68
- ```
69
- University of Toronto, Canada
70
71
- GitHub: https://github.com/leduckhai
72
- ```
73
 
74
- **Tuyen Tran**
75
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
  Hanoi University of Science and Technology, Vietnam
77
- Email: tuyencbt@gmail.com
78
- ```
79
-
80
- **Bui Nguyen Kim Hai**
81
- ```
82
- EΓΆtvΓΆs LorΓ‘nd University, Hungary
83
- Email: htlulem185@gmail.com
84
- ```
 
 
 
 
 
 
19
  - facebook/m2m100_418M
20
  - meta-llama/Llama-3.1-8B
21
  pipeline_tag: translation
22
+ license: mit
23
  ---
24
 
25
+ <p align="center">
26
+ <img src="./MultiMedST_icon.png" alt="MultiMedST_icon" width="70">
27
+ </p>
28
+
29
+ <h1 align="center">MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation</h1>
30
+
31
+
32
+ <p align="center">
33
+ <a href="https://arxiv.org/abs/2504.03546">
34
+ <img src="https://img.shields.io/badge/Paper-arXiv%3A2504.03546-b31b1b?logo=arxiv&logoColor=white" alt="Paper">
35
+ </a>
36
+ <a href="https://huggingface.co/datasets/leduckhai/MultiMed-ST">
37
+ <img src="https://img.shields.io/badge/Dataset-HuggingFace-blue?logo=huggingface&logoColor=white" alt="Dataset">
38
+ </a>
39
+ <a href="https://huggingface.co/leduckhai/MultiMed-ST">
40
+ <img src="https://img.shields.io/badge/Models-HuggingFace-green?logo=huggingface&logoColor=white" alt="Models">
41
+ </a>
42
+ <a href="https://github.com/leduckhai/MultiMed-ST/blob/main/LICENSE">
43
+ <img src="https://img.shields.io/badge/License-MIT-yellow" alt="License">
44
+ </a>
45
+ <a href="https://github.com/leduckhai/MultiMed-ST/stargazers">
46
+ <img src="https://img.shields.io/github/stars/leduckhai/MultiMed-ST?style=social" alt="Stars">
47
+ </a>
48
+ </p>
49
+ <p align="center">
50
+ <strong>πŸ“˜ EMNLP 2025</strong>
51
+ </p>
52
+
53
+ <p align="center">
54
+ <b>Khai Le-Duc*</b>, <b>Tuyen Tran*</b>, Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang, Hung-Phong Tran, Thanh-Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo, Nguyen X. Khanh**, Thanh Nguyen-Tang**
55
+ </p>
56
+
57
+ <p align="center">
58
+ <sub>*Equal contribution &nbsp;&nbsp;|&nbsp;&nbsp; **Equal supervision</sub>
59
+ </p>
60
+ ---
61
+ > ⭐ **If you find this work useful, please consider starring the repo and citing our paper!**
62
 
63
+ ---
64
 
65
+ ## 🧠 Abstract
 
 
66
 
67
+ Multilingual speech translation (ST) in the **medical domain** enhances patient care by enabling effective communication across language barriers, alleviating workforce shortages, and improving diagnosis and treatment β€” especially in global health emergencies.
68
 
69
+ In this work, we introduce **MultiMed-ST**, the *first large-scale multilingual medical speech translation dataset*, spanning **all translation directions** across **five languages**:
70
+ πŸ‡»πŸ‡³ Vietnamese, πŸ‡¬πŸ‡§ English, πŸ‡©πŸ‡ͺ German, πŸ‡«πŸ‡· French, πŸ‡¨πŸ‡³ Traditional & Simplified Chinese.
71
 
72
+ With **290,000 samples**, *MultiMed-ST* represents:
73
+ - 🧩 the **largest medical MT dataset** to date
74
+ - 🌐 the **largest many-to-many multilingual ST dataset** across all domains
75
 
76
+ We also conduct **the most comprehensive ST analysis in the field's history**, to our best knowledge, covering:
77
+ - βœ… Empirical baselines
78
+ - πŸ”„ Bilingual vs. multilingual study
79
+ - 🧩 End-to-end vs. cascaded models
80
+ - 🎯 Task-specific vs. multi-task seq2seq approaches
81
+ - πŸ—£οΈ Code-switching analysis
82
+ - πŸ“Š Quantitative & qualitative error analysis
83
 
84
+ All **code, data, and models** are publicly available: πŸ‘‰ [**GitHub Repository**](https://github.com/leduckhai/MultiMed-ST)
 
85
 
 
 
86
 
87
+ <p align="center">
88
+ <img src="./poster_MultiMed-ST_EMNLP2025.png" alt="poster_MultiMed-ST_EMNLP2025" width="85%">
89
+ </p>
90
 
91
+ ---
 
 
 
 
 
 
 
92
 
93
+ ## 🧰 Repository Overview
94
 
95
+ This repository provides scripts for:
96
 
97
+ - πŸŽ™οΈ **Automatic Speech Recognition (ASR)**
98
+ - 🌍 **Machine Translation (MT)**
99
+ - πŸ”„ **Speech Translation (ST)** β€” both **cascaded** and **end-to-end** seq2seq models
100
 
101
+ It includes:
102
 
103
+ - βš™οΈ Model preparation & fine-tuning
104
+ - πŸš€ Training & inference scripts
105
+ - πŸ“Š Evaluation & benchmarking utilities
106
 
107
+ ---
 
 
 
 
 
108
 
109
+ ## πŸ“¦ Dataset & Models
110
+
111
+ - **Dataset:** [πŸ€— Hugging Face Dataset](https://huggingface.co/datasets/leduckhai/MultiMed-ST)
112
+ - **Fine-tuned Models:** [πŸ€— Hugging Face Models](https://huggingface.co/leduckhai/MultiMed-ST)
113
+
114
+ You can explore and download all fine-tuned models for **MultiMed-ST** directly from our Hugging Face repository:
115
+
116
+ <details>
117
+ <summary><b>πŸ”Ή Whisper ASR Fine-tuned Models (Click to expand) </b></summary>
118
+
119
+ | Language | Model Link |
120
+ |-----------|------------|
121
+ | Chinese | [whisper-small-chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-chinese) |
122
+ | English | [whisper-small-english](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-english) |
123
+ | French | [whisper-small-french](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-french) |
124
+ | German | [whisper-small-german](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-german) |
125
+ | Multilingual | [whisper-small-multilingual](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-multilingual) |
126
+ | Vietnamese | [whisper-small-vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-vietnamese) |
127
+
128
+ </details>
129
+
130
+ <details>
131
+ <summary><b>πŸ”Ή LLaMA-based MT Fine-tuned Models (Click to expand) </b></summary>
132
+
133
+ | Source β†’ Target | Model Link |
134
+ |------------------|------------|
135
+ | Chinese β†’ English | [llama_Chinese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_English) |
136
+ | Chinese β†’ French | [llama_Chinese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_French) |
137
+ | Chinese β†’ German | [llama_Chinese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_German) |
138
+ | Chinese β†’ Vietnamese | [llama_Chinese_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_Vietnamese) |
139
+ | English β†’ Chinese | [llama_English_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Chinese) |
140
+ | English β†’ French | [llama_English_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_French) |
141
+ | English β†’ German | [llama_English_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_German) |
142
+ | English β†’ Vietnamese | [llama_English_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Vietnamese) |
143
+ | French β†’ Chinese | [llama_French_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Chinese) |
144
+ | French β†’ English | [llama_French_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_English) |
145
+ | French β†’ German | [llama_French_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_German) |
146
+ | French β†’ Vietnamese | [llama_French_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Vietnamese) |
147
+ | German β†’ Chinese | [llama_German_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Chinese) |
148
+ | German β†’ English | [llama_German_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_English) |
149
+ | German β†’ French | [llama_German_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_French) |
150
+ | German β†’ Vietnamese | [llama_German_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Vietnamese) |
151
+ | Vietnamese β†’ Chinese | [llama_Vietnamese_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_Chinese) |
152
+ | Vietnamese β†’ English | [llama_Vietnamese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_English) |
153
+ | Vietnamese β†’ French | [llama_Vietnamese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_French) |
154
+ | Vietnamese β†’ German | [llama_Vietnamese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_German) |
155
+
156
+ </details>
157
+
158
+ <details>
159
+ <summary><b>πŸ”Ή m2m100_418M MT Fine-tuned Models (Click to expand) </b></summary>
160
+ | Source β†’ Target | Model Link |
161
+ |------------------|------------|
162
+ | de β†’ en | [m2m100_418M-finetuned-de-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-en) |
163
+ | de β†’ fr | [m2m100_418M-finetuned-de-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-fr) |
164
+ | de β†’ vi | [m2m100_418M-finetuned-de-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-vi) |
165
+ | de β†’ zh | [m2m100_418M-finetuned-de-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-zh) |
166
+ | en β†’ de | [m2m100_418M-finetuned-en-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-de) |
167
+ | en β†’ fr | [m2m100_418M-finetuned-en-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-fr) |
168
+ | en β†’ vi | [m2m100_418M-finetuned-en-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-vi) |
169
+ | en β†’ zh | [m2m100_418M-finetuned-en-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-zh) |
170
+ | fr β†’ de | [m2m100_418M-finetuned-fr-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-de) |
171
+ | fr β†’ en | [m2m100_418M-finetuned-fr-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-en) |
172
+ | fr β†’ vi | [m2m100_418M-finetuned-fr-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-vi) |
173
+ | fr β†’ zh | [m2m100_418M-finetuned-fr-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-zh) |
174
+ | vi β†’ de | [m2m100_418M-finetuned-vi-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-de) |
175
+ | vi β†’ en | [m2m100_418M-finetuned-vi-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-en) |
176
+ | vi β†’ fr | [m2m100_418M-finetuned-vi-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-fr) |
177
+ | vi β†’ zh | [m2m100_418M-finetuned-vi-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-zh) |
178
+ | zh β†’ de | [m2m100_418M-finetuned-zh-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-de) |
179
+ | zh β†’ en | [m2m100_418M-finetuned-zh-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-en) |
180
+ | zh β†’ fr | [m2m100_418M-finetuned-zh-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-fr) |
181
+ | zh β†’ vi | [m2m100_418M-finetuned-zh-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-vi) |
182
+ </details>
183
+ ---
184
+ ## πŸ‘¨β€πŸ’» Core Developers
185
+ 1. **Khai Le-Duc**
186
+ University of Toronto, Canada
187
+ πŸ“§ [[email protected]](mailto:[email protected])
188
+ πŸ”— [https://github.com/leduckhai](https://github.com/leduckhai)
189
+ 2. **Tuyen Tran**: πŸ“§ [[email protected]](mailto:[email protected])
190
  Hanoi University of Science and Technology, Vietnam
191
+ 3. **Nguyen Kim Hai Bui**: πŸ“§ [htlulem185@gmail.com](mailto:[email protected])
192
+ EΓΆtvΓΆs LorΓ‘nd University, Hungary
193
+ ## 🧾 Citation
194
+ If you use our dataset or models, please cite:
195
+ πŸ“„ [arXiv:2504.03546](https://arxiv.org/abs/2504.03546)
196
+ ```bibtex
197
+ @inproceedings{le2025multimedst,
198
+ title={MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation},
199
+ author={Le-Duc, Khai and Tran, Tuyen and Tat, Bach Phan and Bui, Nguyen Kim Hai and Anh, Quan Dang and Tran, Hung-Phong and Nguyen, Thanh Thuy and Nguyen, Ly and Phan, Tuan Minh and Tran, Thi Thu Phuong and others},
200
+ booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
201
+ pages={11838--11963},
202
+ year={2025}
203
+ }