Update README.md
Browse files
README.md
CHANGED
|
@@ -19,66 +19,185 @@ base_model:
|
|
| 19 |
- facebook/m2m100_418M
|
| 20 |
- meta-llama/Llama-3.1-8B
|
| 21 |
pipeline_tag: translation
|
|
|
|
| 22 |
---
|
| 23 |
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
-
|
| 27 |
|
| 28 |
-
|
| 29 |
-
<div align="center">Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang, Hung-Phong Tran, Thanh-Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo,</div>
|
| 30 |
-
<div align="center">Nguyen X. Khanh**, Thanh Nguyen-Tang**</div>
|
| 31 |
|
|
|
|
| 32 |
|
| 33 |
-
|
| 34 |
-
|
| 35 |
|
| 36 |
-
*
|
| 37 |
-
|
|
|
|
| 38 |
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
-
|
| 42 |
-
[https://github.com/leduckhai/MultiMed-ST](https://github.com/leduckhai/MultiMed-ST)
|
| 43 |
|
| 44 |
-
* **Citation:**
|
| 45 |
-
Please cite this paper: [https://arxiv.org/abs/2504.03546](https://arxiv.org/abs/2504.03546)
|
| 46 |
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
-
|
| 49 |
-
@article{le2025multimedst,
|
| 50 |
-
title={MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation},
|
| 51 |
-
author={Le-Duc, Khai and Tran, Tuyen and Tat, Bach Phan and Bui, Nguyen Kim Hai and Dang, Quan and Tran, Hung-Phong and Nguyen, Thanh-Thuy and Nguyen, Ly and Phan, Tuan-Minh and Tran, Thi Thu Phuong and others},
|
| 52 |
-
journal={arXiv preprint arXiv:2504.03546},
|
| 53 |
-
year={2025}
|
| 54 |
-
}
|
| 55 |
-
```
|
| 56 |
|
| 57 |
-
##
|
| 58 |
|
| 59 |
-
|
| 60 |
|
| 61 |
-
|
|
|
|
|
|
|
| 62 |
|
| 63 |
-
|
| 64 |
|
| 65 |
-
|
|
|
|
|
|
|
| 66 |
|
| 67 |
-
|
| 68 |
-
```
|
| 69 |
-
University of Toronto, Canada
|
| 70 |
-
Email: [email protected]
|
| 71 |
-
GitHub: https://github.com/leduckhai
|
| 72 |
-
```
|
| 73 |
|
| 74 |
-
|
| 75 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
Hanoi University of Science and Technology, Vietnam
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
- facebook/m2m100_418M
|
| 20 |
- meta-llama/Llama-3.1-8B
|
| 21 |
pipeline_tag: translation
|
| 22 |
+
license: mit
|
| 23 |
---
|
| 24 |
|
| 25 |
+
<p align="center">
|
| 26 |
+
<img src="./MultiMedST_icon.png" alt="MultiMedST_icon" width="70">
|
| 27 |
+
</p>
|
| 28 |
+
|
| 29 |
+
<h1 align="center">MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation</h1>
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
<p align="center">
|
| 33 |
+
<a href="https://arxiv.org/abs/2504.03546">
|
| 34 |
+
<img src="https://img.shields.io/badge/Paper-arXiv%3A2504.03546-b31b1b?logo=arxiv&logoColor=white" alt="Paper">
|
| 35 |
+
</a>
|
| 36 |
+
<a href="https://huggingface.co/datasets/leduckhai/MultiMed-ST">
|
| 37 |
+
<img src="https://img.shields.io/badge/Dataset-HuggingFace-blue?logo=huggingface&logoColor=white" alt="Dataset">
|
| 38 |
+
</a>
|
| 39 |
+
<a href="https://huggingface.co/leduckhai/MultiMed-ST">
|
| 40 |
+
<img src="https://img.shields.io/badge/Models-HuggingFace-green?logo=huggingface&logoColor=white" alt="Models">
|
| 41 |
+
</a>
|
| 42 |
+
<a href="https://github.com/leduckhai/MultiMed-ST/blob/main/LICENSE">
|
| 43 |
+
<img src="https://img.shields.io/badge/License-MIT-yellow" alt="License">
|
| 44 |
+
</a>
|
| 45 |
+
<a href="https://github.com/leduckhai/MultiMed-ST/stargazers">
|
| 46 |
+
<img src="https://img.shields.io/github/stars/leduckhai/MultiMed-ST?style=social" alt="Stars">
|
| 47 |
+
</a>
|
| 48 |
+
</p>
|
| 49 |
+
<p align="center">
|
| 50 |
+
<strong>π EMNLP 2025</strong>
|
| 51 |
+
</p>
|
| 52 |
+
|
| 53 |
+
<p align="center">
|
| 54 |
+
<b>Khai Le-Duc*</b>, <b>Tuyen Tran*</b>, Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang, Hung-Phong Tran, Thanh-Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo, Nguyen X. Khanh**, Thanh Nguyen-Tang**
|
| 55 |
+
</p>
|
| 56 |
+
|
| 57 |
+
<p align="center">
|
| 58 |
+
<sub>*Equal contribution | **Equal supervision</sub>
|
| 59 |
+
</p>
|
| 60 |
+
---
|
| 61 |
+
> β **If you find this work useful, please consider starring the repo and citing our paper!**
|
| 62 |
|
| 63 |
+
---
|
| 64 |
|
| 65 |
+
## π§ Abstract
|
|
|
|
|
|
|
| 66 |
|
| 67 |
+
Multilingual speech translation (ST) in the **medical domain** enhances patient care by enabling effective communication across language barriers, alleviating workforce shortages, and improving diagnosis and treatment β especially in global health emergencies.
|
| 68 |
|
| 69 |
+
In this work, we introduce **MultiMed-ST**, the *first large-scale multilingual medical speech translation dataset*, spanning **all translation directions** across **five languages**:
|
| 70 |
+
π»π³ Vietnamese, π¬π§ English, π©πͺ German, π«π· French, π¨π³ Traditional & Simplified Chinese.
|
| 71 |
|
| 72 |
+
With **290,000 samples**, *MultiMed-ST* represents:
|
| 73 |
+
- π§© the **largest medical MT dataset** to date
|
| 74 |
+
- π the **largest many-to-many multilingual ST dataset** across all domains
|
| 75 |
|
| 76 |
+
We also conduct **the most comprehensive ST analysis in the field's history**, to our best knowledge, covering:
|
| 77 |
+
- β
Empirical baselines
|
| 78 |
+
- π Bilingual vs. multilingual study
|
| 79 |
+
- π§© End-to-end vs. cascaded models
|
| 80 |
+
- π― Task-specific vs. multi-task seq2seq approaches
|
| 81 |
+
- π£οΈ Code-switching analysis
|
| 82 |
+
- π Quantitative & qualitative error analysis
|
| 83 |
|
| 84 |
+
All **code, data, and models** are publicly available: π [**GitHub Repository**](https://github.com/leduckhai/MultiMed-ST)
|
|
|
|
| 85 |
|
|
|
|
|
|
|
| 86 |
|
| 87 |
+
<p align="center">
|
| 88 |
+
<img src="./poster_MultiMed-ST_EMNLP2025.png" alt="poster_MultiMed-ST_EMNLP2025" width="85%">
|
| 89 |
+
</p>
|
| 90 |
|
| 91 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
+
## π§° Repository Overview
|
| 94 |
|
| 95 |
+
This repository provides scripts for:
|
| 96 |
|
| 97 |
+
- ποΈ **Automatic Speech Recognition (ASR)**
|
| 98 |
+
- π **Machine Translation (MT)**
|
| 99 |
+
- π **Speech Translation (ST)** β both **cascaded** and **end-to-end** seq2seq models
|
| 100 |
|
| 101 |
+
It includes:
|
| 102 |
|
| 103 |
+
- βοΈ Model preparation & fine-tuning
|
| 104 |
+
- π Training & inference scripts
|
| 105 |
+
- π Evaluation & benchmarking utilities
|
| 106 |
|
| 107 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
|
| 109 |
+
## π¦ Dataset & Models
|
| 110 |
+
|
| 111 |
+
- **Dataset:** [π€ Hugging Face Dataset](https://huggingface.co/datasets/leduckhai/MultiMed-ST)
|
| 112 |
+
- **Fine-tuned Models:** [π€ Hugging Face Models](https://huggingface.co/leduckhai/MultiMed-ST)
|
| 113 |
+
|
| 114 |
+
You can explore and download all fine-tuned models for **MultiMed-ST** directly from our Hugging Face repository:
|
| 115 |
+
|
| 116 |
+
<details>
|
| 117 |
+
<summary><b>πΉ Whisper ASR Fine-tuned Models (Click to expand) </b></summary>
|
| 118 |
+
|
| 119 |
+
| Language | Model Link |
|
| 120 |
+
|-----------|------------|
|
| 121 |
+
| Chinese | [whisper-small-chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-chinese) |
|
| 122 |
+
| English | [whisper-small-english](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-english) |
|
| 123 |
+
| French | [whisper-small-french](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-french) |
|
| 124 |
+
| German | [whisper-small-german](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-german) |
|
| 125 |
+
| Multilingual | [whisper-small-multilingual](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-multilingual) |
|
| 126 |
+
| Vietnamese | [whisper-small-vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/asr/whisper-small-vietnamese) |
|
| 127 |
+
|
| 128 |
+
</details>
|
| 129 |
+
|
| 130 |
+
<details>
|
| 131 |
+
<summary><b>πΉ LLaMA-based MT Fine-tuned Models (Click to expand) </b></summary>
|
| 132 |
+
|
| 133 |
+
| Source β Target | Model Link |
|
| 134 |
+
|------------------|------------|
|
| 135 |
+
| Chinese β English | [llama_Chinese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_English) |
|
| 136 |
+
| Chinese β French | [llama_Chinese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_French) |
|
| 137 |
+
| Chinese β German | [llama_Chinese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_German) |
|
| 138 |
+
| Chinese β Vietnamese | [llama_Chinese_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Chinese_Vietnamese) |
|
| 139 |
+
| English β Chinese | [llama_English_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Chinese) |
|
| 140 |
+
| English β French | [llama_English_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_French) |
|
| 141 |
+
| English β German | [llama_English_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_German) |
|
| 142 |
+
| English β Vietnamese | [llama_English_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_English_Vietnamese) |
|
| 143 |
+
| French β Chinese | [llama_French_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Chinese) |
|
| 144 |
+
| French β English | [llama_French_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_English) |
|
| 145 |
+
| French β German | [llama_French_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_German) |
|
| 146 |
+
| French β Vietnamese | [llama_French_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_French_Vietnamese) |
|
| 147 |
+
| German β Chinese | [llama_German_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Chinese) |
|
| 148 |
+
| German β English | [llama_German_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_English) |
|
| 149 |
+
| German β French | [llama_German_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_French) |
|
| 150 |
+
| German β Vietnamese | [llama_German_Vietnamese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_German_Vietnamese) |
|
| 151 |
+
| Vietnamese β Chinese | [llama_Vietnamese_Chinese](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_Chinese) |
|
| 152 |
+
| Vietnamese β English | [llama_Vietnamese_English](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_English) |
|
| 153 |
+
| Vietnamese β French | [llama_Vietnamese_French](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_French) |
|
| 154 |
+
| Vietnamese β German | [llama_Vietnamese_German](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/llama_Vietnamese_German) |
|
| 155 |
+
|
| 156 |
+
</details>
|
| 157 |
+
|
| 158 |
+
<details>
|
| 159 |
+
<summary><b>πΉ m2m100_418M MT Fine-tuned Models (Click to expand) </b></summary>
|
| 160 |
+
| Source β Target | Model Link |
|
| 161 |
+
|------------------|------------|
|
| 162 |
+
| de β en | [m2m100_418M-finetuned-de-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-en) |
|
| 163 |
+
| de β fr | [m2m100_418M-finetuned-de-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-fr) |
|
| 164 |
+
| de β vi | [m2m100_418M-finetuned-de-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-vi) |
|
| 165 |
+
| de β zh | [m2m100_418M-finetuned-de-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-de-to-zh) |
|
| 166 |
+
| en β de | [m2m100_418M-finetuned-en-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-de) |
|
| 167 |
+
| en β fr | [m2m100_418M-finetuned-en-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-fr) |
|
| 168 |
+
| en β vi | [m2m100_418M-finetuned-en-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-vi) |
|
| 169 |
+
| en β zh | [m2m100_418M-finetuned-en-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-en-to-zh) |
|
| 170 |
+
| fr β de | [m2m100_418M-finetuned-fr-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-de) |
|
| 171 |
+
| fr β en | [m2m100_418M-finetuned-fr-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-en) |
|
| 172 |
+
| fr β vi | [m2m100_418M-finetuned-fr-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-vi) |
|
| 173 |
+
| fr β zh | [m2m100_418M-finetuned-fr-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-fr-to-zh) |
|
| 174 |
+
| vi β de | [m2m100_418M-finetuned-vi-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-de) |
|
| 175 |
+
| vi β en | [m2m100_418M-finetuned-vi-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-en) |
|
| 176 |
+
| vi β fr | [m2m100_418M-finetuned-vi-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-fr) |
|
| 177 |
+
| vi β zh | [m2m100_418M-finetuned-vi-to-zh](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-vi-to-zh) |
|
| 178 |
+
| zh β de | [m2m100_418M-finetuned-zh-to-de](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-de) |
|
| 179 |
+
| zh β en | [m2m100_418M-finetuned-zh-to-en](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-en) |
|
| 180 |
+
| zh β fr | [m2m100_418M-finetuned-zh-to-fr](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-fr) |
|
| 181 |
+
| zh β vi | [m2m100_418M-finetuned-zh-to-vi](https://huggingface.co/leduckhai/MultiMed-ST/tree/main/m2m100_418M-finetuned-zh-to-vi) |
|
| 182 |
+
</details>
|
| 183 |
+
---
|
| 184 |
+
## π¨βπ» Core Developers
|
| 185 |
+
1. **Khai Le-Duc**
|
| 186 |
+
University of Toronto, Canada
|
| 187 |
+
π§ [[email protected]](mailto:[email protected])
|
| 188 |
+
π [https://github.com/leduckhai](https://github.com/leduckhai)
|
| 189 |
+
2. **Tuyen Tran**: π§ [[email protected]](mailto:[email protected])
|
| 190 |
Hanoi University of Science and Technology, Vietnam
|
| 191 |
+
3. **Nguyen Kim Hai Bui**: π§ [htlulem185@gmail.com](mailto:[email protected])
|
| 192 |
+
EΓΆtvΓΆs LorΓ‘nd University, Hungary
|
| 193 |
+
## π§Ύ Citation
|
| 194 |
+
If you use our dataset or models, please cite:
|
| 195 |
+
π [arXiv:2504.03546](https://arxiv.org/abs/2504.03546)
|
| 196 |
+
```bibtex
|
| 197 |
+
@inproceedings{le2025multimedst,
|
| 198 |
+
title={MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation},
|
| 199 |
+
author={Le-Duc, Khai and Tran, Tuyen and Tat, Bach Phan and Bui, Nguyen Kim Hai and Anh, Quan Dang and Tran, Hung-Phong and Nguyen, Thanh Thuy and Nguyen, Ly and Phan, Tuan Minh and Tran, Thi Thu Phuong and others},
|
| 200 |
+
booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
|
| 201 |
+
pages={11838--11963},
|
| 202 |
+
year={2025}
|
| 203 |
+
}
|