Pokqok commited on
Commit
90813f6
ยท
verified ยท
1 Parent(s): 9437c69

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -25
README.md CHANGED
@@ -1,38 +1,42 @@
1
  ---
2
  language:
3
- - ko
4
- - ja
5
- - zh
 
6
  license: apache-2.0
7
  library_name: optimum
8
  tags:
9
- - translation
10
- - m2m100
11
- - korean
12
- - japanese
13
- - chinese
14
- - k-tourism
15
- - onnx
 
16
  pipeline_tag: translation
17
  base_model: facebook/m2m100_1.2B
18
  datasets:
19
- - custom-k-tourism-corpus
20
  ---
21
 
22
  # M2M100 Korean Tourism Translator (ONNX)
23
 
24
- ์ด ๋ชจ๋ธ์€ `facebook/m2m100_1.2B` ๋ชจ๋ธ์„ **ํ•œ๊ตญ ๊ด€๊ด‘** ๊ด€๋ จ ๋ฐ์ดํ„ฐ๋กœ ํŒŒ์ธํŠœ๋‹(Fine-tuning)ํ•œ ํ›„, ์ถ”๋ก  ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•ด **ONNX (Open Neural Network Exchange)** ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•œ ๋ฒˆ์—ญ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
25
 
26
- ์ฃผ์š” ๊ธฐ๋Šฅ์€ ํ•œ๊ตญ์–ด(ko)๋กœ ๋œ ๊ด€๊ด‘ ๊ด€๋ จ ํ…์ŠคํŠธ๋ฅผ ์ผ๋ณธ์–ด(ja)์™€ ์ค‘๊ตญ์–ด(zh)๋กœ ๋ฒˆ์—ญํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
27
 
28
- - **Base Model:** [facebook/m2m100_1.2B](https://huggingface.co/facebook/m2m100_1.2B)
29
- - **Specialization:** Korean Tourism Domain (ํ•œ๊ตญ ๊ด€๊ด‘ ํŠนํ™”)
30
- - **Target Languages:** Japanese (ja), Chinese (zh)
31
- - **Format:** ONNX (Optimized for fast CPU/GPU inference)
32
 
33
  ## Model Description
34
 
35
- M2M100์€ ๋ณ„๋„์˜ ์–ธ์–ด ์ง€์ • ์—†์ด 100๊ฐœ ์–ธ์–ด ๊ฐ„์˜ ๋ฒˆ์—ญ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค๊ตญ์–ด ๋ฒˆ์—ญ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ๊ทธ ์ค‘์—์„œ๋„ ํŠนํžˆ ํ•œ๊ตญ ๊ด€๊ด‘ ๋ถ„์•ผ์˜ ์šฉ์–ด์™€ ๋ฌธ์ฒด์— ๋Œ€ํ•œ ์ดํ•ด๋„๋ฅผ ๋†’์ด๊ธฐ ์œ„ํ•ด, ๊ด€๋ จ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์ถ”๊ฐ€ ํ•™์Šต์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
 
 
36
 
37
  ONNX ๋ณ€ํ™˜์„ ํ†ตํ•ด ๊ธฐ์กด PyTorch ๋ชจ๋ธ๋ณด๋‹ค ๊ฐ€๋ณ๊ณ  ๋น ๋ฅด๊ฒŒ ์ž‘๋™ํ•˜๋ฏ€๋กœ, FastAPI ๋“ฑ์„ ์ด์šฉํ•œ API ์„œ๋ฒ„ ๋ฐฐํฌ์— ๋งค์šฐ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
38
 
@@ -83,24 +87,44 @@ result_zh = translator(
83
  )
84
  print(f"Korean to Chinese: {result_zh[0]['translation_text']}")
85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  # --- ์ถœ๋ ฅ ๊ฒฐ๊ณผ ์˜ˆ์‹œ ---
87
  # Korean to Japanese: ๆ™ฏ็ฆๅฎฎใฎๅคœ้–“้–‹ๅ ดๅ…ฅๅ ดๅˆธใฏใฉใ“ใง่ณผๅ…ฅใงใใพใ™ใ‹ใ€‚
88
  # Korean to Chinese: ๆ™ฏ็ฆๅฎซๅคœ้—ดๅผ€ๆ”พ้—จ็ฅจๅœจๅ“ช้‡Œ่ดญไนฐ?
 
 
89
  ```
90
 
91
  ## Model Details
92
 
93
  ### Fine-tuning
94
-
95
- - **Base Model:** `facebook/m2m100_1.2B`
96
- - **Training Data:** ์ž์ฒด์ ์œผ๋กœ ์ˆ˜์ง‘ ๋ฐ ์ •์ œํ•œ ํ•œ๊ตญ ๊ด€๊ด‘ ๊ด€๋ จ ๋ฌธ์žฅ ์Œ ๋ฐ์ดํ„ฐ์…‹ (K-Tourism Corpus)์œผ๋กœ ํŒŒ์ธํŠœ๋‹์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ์…‹์—๋Š” ๊ด€๊ด‘์ง€ ์ •๋ณด, ๊ตํ†ต, ์ˆ™๋ฐ•, ์Œ์‹, ์‡ผํ•‘ ๋“ฑ ๋‹ค์–‘ํ•œ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‚ด์šฉ์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
97
- - **Objective:** ์ผ๋ฐ˜์ ์ธ ๋ฒˆ์—ญ ๋ชจ๋ธ์ด ์–ด์ƒ‰ํ•˜๊ฒŒ ๋ฒˆ์—ญํ•  ์ˆ˜ ์žˆ๋Š” ๊ณ ์œ ๋ช…์‚ฌ๋‚˜ ํŠน์ • ์ƒํ™ฉ(ํ‹ฐ์ผ“ ์˜ˆ๋งค, ๊ธธ ์ฐพ๊ธฐ ๋“ฑ)์— ๋Œ€ํ•œ ๋ฒˆ์—ญ ํ’ˆ์งˆ์„ ๋†’์ด๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ–ˆ์Šต๋‹ˆ๋‹ค.
98
 
99
  ### ONNX Conversion
100
-
101
- - **Performance:** PyTorch ๋ชจ๋ธ์„ ONNX๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์–‘์žํ™”(quantization) ๋ฐ ์ตœ์ ํ™”๋ฅผ ๊ฑฐ์ณค์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด CPU ํ™˜๊ฒฝ์—์„œ๋„ ๋” ๋น ๋ฅธ ์ถ”๋ก  ์†๋„๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, ์ด๋Š” ์‹ค๏ฟฝ๏ฟฝ๏ฟฝ๊ฐ„ ๋ฒˆ์—ญ API ์„œ๋น„์Šค์— ํฐ ์ด์ ์ž…๋‹ˆ๋‹ค.
102
- - **Compatibility:** ONNX Runtime์€ ๋‹ค์–‘ํ•œ ํ•˜๋“œ์›จ์–ด ๋ฐ ํ”Œ๋žซํผ์„ ์ง€์›ํ•˜์—ฌ ๋ชจ๋ธ ๋ฐฐํฌ์˜ ์œ ์—ฐ์„ฑ์„ ๋†’์—ฌ์ค๋‹ˆ๋‹ค.
103
 
104
  ## Deployment
105
 
106
  ์ด ONNX ๋ชจ๋ธ์€ FastAPI์™€ Docker๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ API ์„œ๋ฒ„๋กœ ์‰ฝ๊ฒŒ ๋ฐฐํฌํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋ฐฐํฌ ๋ฐฉ๋ฒ•์€ ๊ด€๋ จ ํ”„๋กœ์ ํŠธ์˜ `Dockerfile`๊ณผ `app.py`๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.
 
 
 
 
1
  ---
2
  language:
3
+ - ko
4
+ - ja
5
+ - zh
6
+ - en
7
  license: apache-2.0
8
  library_name: optimum
9
  tags:
10
+ - translation
11
+ - m2m100
12
+ - korean
13
+ - japanese
14
+ - chinese
15
+ - english
16
+ - k-tourism
17
+ - onnx
18
  pipeline_tag: translation
19
  base_model: facebook/m2m100_1.2B
20
  datasets:
21
+ - custom-k-tourism-corpus
22
  ---
23
 
24
  # M2M100 Korean Tourism Translator (ONNX)
25
 
26
+ ์ด ๋ชจ๋ธ์€ `facebook/m2m100_1.2B` ๋ชจ๋ธ์„ ํ•œ๊ตญ ๊ด€๊ด‘ ๊ด€๋ จ ๋ฐ์ดํ„ฐ๋กœ ํŒŒ์ธํŠœ๋‹(Fine-tuning)ํ•œ ํ›„, ์ถ”๋ก  ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•ด ONNX (Open Neural Network Exchange) ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•œ ๋ฒˆ์—ญ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
27
 
28
+ ์ฃผ์š” ๊ธฐ๋Šฅ์€ ํ•œ๊ตญ์–ด(ko)์™€ ์˜์–ด(en), ์ผ๋ณธ์–ด(ja), ์ค‘๊ตญ์–ด(zh) ๊ฐ„์˜ ์–‘๋ฐฉํ–ฅ ๋ฒˆ์—ญ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
29
 
30
+ - **Base Model**: `facebook/m2m100_1.2B`
31
+ - **Specialization**: Korean Tourism Domain (ํ•œ๊ตญ ๊ด€๊ด‘ ํŠนํ™”)
32
+ - **Target Languages**: English (en), Japanese (ja), Chinese (zh)
33
+ - **Format**: ONNX (Optimized for fast CPU/GPU inference)
34
 
35
  ## Model Description
36
 
37
+ M2M100์€ ๋ณ„๋„์˜ ์–ธ์–ด ์ง€์ • ์—†์ด 100๊ฐœ ์–ธ์–ด ๊ฐ„์˜ ๋ฒˆ์—ญ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค๊ตญ์–ด ๋ฒˆ์—ญ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ๊ทธ ์ค‘์—์„œ๋„ ํŠนํžˆ ํ•œ๊ตญ ๊ด€๊ด‘ ๋ถ„์•ผ์˜ ์šฉ์–ด์™€ ๋ฌธ์ฒด์— ๋Œ€ํ•œ ์ดํ•ด๋„๋ฅผ ๋†’์ด๊ธฐ ์œ„ํ•ด, ๊ด€๋ จ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์ถ”๊ฐ€ ํ•™์Šต์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
38
+
39
+ ํŠนํžˆ, ํ•œ๊ตญ์˜ ํŠน์ • ์ง€๋ช…(์˜ˆ: ๊ฒฝ๋ณต๊ถ, ๋ช…๋™)์ด๋‚˜ ์Œ์‹ ์ด๋ฆ„(์˜ˆ: ๋น„๋น”๋ฐฅ, ๋–ก๋ณถ์ด) ๋“ฑ ๊ณ ์œ ๋ช…์‚ฌ์— ๋Œ€ํ•œ 3๊ฐœ ์–ธ์–ด(์˜์–ด, ์ผ๋ณธ์–ด, ์ค‘๊ตญ์–ด) ๋ฒˆ์—ญ ์ •ํ™•๋„๋ฅผ ๋†’์ด๋Š” ๋ฐ ์ค‘์ ์„ ๋‘์—ˆ์Šต๋‹ˆ๋‹ค.
40
 
41
  ONNX ๋ณ€ํ™˜์„ ํ†ตํ•ด ๊ธฐ์กด PyTorch ๋ชจ๋ธ๋ณด๋‹ค ๊ฐ€๋ณ๊ณ  ๋น ๋ฅด๊ฒŒ ์ž‘๋™ํ•˜๋ฏ€๋กœ, FastAPI ๋“ฑ์„ ์ด์šฉํ•œ API ์„œ๋ฒ„ ๋ฐฐํฌ์— ๋งค์šฐ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
42
 
 
87
  )
88
  print(f"Korean to Chinese: {result_zh[0]['translation_text']}")
89
 
90
+ # ํ•œ๊ตญ์–ด -> ์˜์–ด ๋ฒˆ์—ญ
91
+ result_en = translator(
92
+ korean_text,
93
+ src_lang="ko",
94
+ tgt_lang="en"
95
+ )
96
+ print(f"Korean to English: {result_en[0]['translation_text']}")
97
+
98
+ # ์ผ๋ณธ์–ด -> ํ•œ๊ตญ์–ด ๋ฒˆ์—ญ
99
+ japanese_text = "ๆ™ฏ็ฆๅฎฎใฎๅคœ้–“้–‹ๅ ดๅ…ฅๅ ดๅˆธใฏใฉใ“ใง่ณผๅ…ฅใงใใพใ™ใ‹ใ€‚"
100
+ result_ko_from_ja = translator(
101
+ japanese_text,
102
+ src_lang="ja",
103
+ tgt_lang="ko"
104
+ )
105
+ print(f"Japanese to Korean: {result_ko_from_ja[0]['translation_text']}")
106
+
107
  # --- ์ถœ๋ ฅ ๊ฒฐ๊ณผ ์˜ˆ์‹œ ---
108
  # Korean to Japanese: ๆ™ฏ็ฆๅฎฎใฎๅคœ้–“้–‹ๅ ดๅ…ฅๅ ดๅˆธใฏใฉใ“ใง่ณผๅ…ฅใงใใพใ™ใ‹ใ€‚
109
  # Korean to Chinese: ๆ™ฏ็ฆๅฎซๅคœ้—ดๅผ€ๆ”พ้—จ็ฅจๅœจๅ“ช้‡Œ่ดญไนฐ?
110
+ # Korean to English: Where can I buy tickets for the Gyeongbok Palace night opening?
111
+ # Japanese to Korean: ๊ฒฝ๋ณต๊ถ ์•ผ๊ฐ„ ๊ฐœ์žฅ ์ž…์žฅ๊ถŒ์€ ์–ด๋””์—์„œ ๊ตฌ์ž…ํ•ฉ๋‹ˆ๊นŒ?
112
  ```
113
 
114
  ## Model Details
115
 
116
  ### Fine-tuning
117
+ - **Base Model**: `facebook/m2m100_1.2B`
118
+ - **Training Data**: ์ž์ฒด์ ์œผ๋กœ ์ˆ˜์ง‘ ๋ฐ ์ •์ œํ•œ ํ•œ๊ตญ ๊ด€๊ด‘ ๊ด€๋ จ ๋ฌธ์žฅ ์Œ ๋ฐ์ดํ„ฐ์…‹ (K-Tourism Corpus)์œผ๋กœ ํŒŒ์ธํŠœ๋‹์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ์…‹์—๋Š” ๊ด€๊ด‘์ง€ ์ •๋ณด, ์Œ์‹, ํ–‰์‚ฌ ๋“ฑ ๋‹ค์–‘ํ•œ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‚ด์šฉ์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
119
+ - **Objective**: ์ผ๋ฐ˜์ ์ธ ๋ฒˆ์—ญ ๋ชจ๋ธ์ด ์–ด์ƒ‰ํ•˜๊ฒŒ ๋ฒˆ์—ญํ•  ์ˆ˜ ์žˆ๋Š” ํ•œ๊ตญ ๊ด€๊ด‘ ๊ด€๋ จ ๊ณ ์œ ๋ช…์‚ฌ(์ง€๋ช…, ์Œ์‹ ์ด๋ฆ„ ๋“ฑ)์— ๋Œ€ํ•œ 3๊ฐœ ์–ธ์–ด ๋ฒˆ์—ญ ํ’ˆ์งˆ์„ ๋†’์ด๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ–ˆ์Šต๋‹ˆ๋‹ค.
 
120
 
121
  ### ONNX Conversion
122
+ - **Performance**: PyTorch ๋ชจ๋ธ์„ ONNX๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์–‘์žํ™”(quantization) ๋ฐ ์ตœ์ ํ™”๋ฅผ ๊ฑฐ์ณค์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด CPU ํ™˜๊ฒฝ์—์„œ๋„ ๋” ๋น ๋ฅธ ์ถ”๋ก  ์†๋„๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, ์ด๋Š” ์‹ค์‹œ๊ฐ„ ๋ฒˆ์—ญ API ์„œ๋น„์Šค์— ํฐ ์ด์ ์ž…๋‹ˆ๋‹ค.
123
+ - **Compatibility**: ONNX Runtime์€ ๋‹ค์–‘ํ•œ ํ•˜๋“œ์›จ์–ด ๋ฐ ํ”Œ๋žซํผ์„ ์ง€์›ํ•˜์—ฌ ๋ชจ๋ธ ๋ฐฐํฌ์˜ ์œ ์—ฐ์„ฑ์„ ๋†’์—ฌ์ค๋‹ˆ๋‹ค.
 
124
 
125
  ## Deployment
126
 
127
  ์ด ONNX ๋ชจ๋ธ์€ FastAPI์™€ Docker๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ API ์„œ๋ฒ„๋กœ ์‰ฝ๊ฒŒ ๋ฐฐํฌํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋ฐฐํฌ ๋ฐฉ๋ฒ•์€ ๊ด€๋ จ ํ”„๋กœ์ ํŠธ์˜ `Dockerfile`๊ณผ `app.py`๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.
128
+
129
+ ### Docker Hub Image
130
+ - **[Repository](https://hub.docker.com/repository/docker/pokqok/m2m100-k-tourism-ko-ja-zh-onnx/general)**: