|
|
--- |
|
|
|
|
|
|
|
|
language: ko |
|
|
license: other |
|
|
tags: |
|
|
- text-classification |
|
|
- korean |
|
|
- emotion-analysis |
|
|
- klue |
|
|
- roberta |
|
|
pipeline_tag: text-classification |
|
|
datasets: |
|
|
- custom-korean-emotion-dataset |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
model-index: |
|
|
- name: 6-Class Korean Emotion Analysis |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Text Classification |
|
|
dataset: |
|
|
name: Custom Test Set |
|
|
type: custom-korean-emotion-dataset |
|
|
config: default |
|
|
split: test |
|
|
metrics: |
|
|
- name: Accuracy |
|
|
type: accuracy |
|
|
value: 0.7905 |
|
|
- name: F1 (Weighted) |
|
|
type: f1 |
|
|
value: 0.7910 |
|
|
- name: Loss |
|
|
type: loss |
|
|
value: 0.6943 |
|
|
--- |
|
|
|
|
|
# 6-Class ํ๊ตญ์ด ๊ฐ์ ๋ถ์ ๋ชจ๋ธ (v2) |
|
|
|
|
|
๋ณธ ๋ชจ๋ธ์ [klue/roberta-base](https://huggingface.co/klue/roberta-base)๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํ์ฌ, ํ๊ตญ์ด ํ
์คํธ์ ๊ฐ์ ์ 6๊ฐ์ง ํด๋์ค๋ก ๋ถ๋ฅํ๋ ํ
์คํธ ๋ถ๋ฅ(Sequence Classification) ๋ชจ๋ธ์
๋๋ค. |
|
|
|
|
|
**์ฃผ์ ํน์ง:** |
|
|
* **6-Class ๋ถ๋ฅ:** '๊ธฐ์จ', '๋นํฉ', '๋ถ๋
ธ', '๋ถ์', '์์ฒ', '์ฌํ'์ 6๊ฐ์ง ๊ฐ์ ์ผ๋ก ๋ถ๋ฅํฉ๋๋ค. |
|
|
* **๋ถ๊ท ํ ๋ฐ์ดํฐ ์ฒ๋ฆฌ:** `CrossEntropyLoss`์ ์๋์ผ๋ก **ํด๋์ค ๊ฐ์ค์น(Class Weights)**๋ฅผ ์ ์ฉํ์ฌ ๋ฐ์ดํฐ ๋ถ๊ท ํ ๋ฌธ์ ๋ฅผ ์ํํ๊ณ , ์์ ํด๋์ค(๊ธฐ์จ, ๋นํฉ ๋ฑ)์ ํ์ง ์ฑ๋ฅ์ ๋์์ต๋๋ค. |
|
|
|
|
|
## ๐ ๋ชจ๋ธ ๋ผ๋ฒจ (Labels) |
|
|
|
|
|
๋ชจ๋ธ์ ์ถ๋ ฅ์ 6๊ฐ์ง ๊ฐ์ ํด๋์ค์ ํด๋นํ๋ฉฐ, ๋ผ๋ฒจ๊ณผ ID๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค. |
|
|
|
|
|
| Label (๊ฐ์ ) | ID | |
|
|
| :--- | :--: | |
|
|
| `๊ธฐ์จ` | 0 | |
|
|
| `๋นํฉ` | 1 | |
|
|
| `๋ถ๋
ธ` | 2 | |
|
|
| `๋ถ์` | 3 | |
|
|
| `์์ฒ` | 4 | |
|
|
| `์ฌํ` | 5 | |
|
|
|
|
|
*(์ฐธ๊ณ : ๋ผ๋ฒจ ์์๋ ํ๋ จ ๋ฐ์ดํฐ์
(df_train) ๊ธฐ์ค์ผ๋ก ์๋ ์์ฑ๋ `['๊ธฐ์จ', '๋นํฉ', '๋ถ๋
ธ', '๋ถ์', '์์ฒ', '์ฌํ']` ์์๋ฅผ ๋ฐ๋ฆ
๋๋ค.)* |
|
|
|
|
|
## ๐ ์ฌ์ฉ ๋ฐฉ๋ฒ (How to Use) |
|
|
|
|
|
`transformers` ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ `pipeline`์ ์ฌ์ฉํ์ฌ ์ฝ๊ฒ ๋ชจ๋ธ์ ํ
์คํธํ ์ ์์ต๋๋ค. |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
# TODO: '[YOUR-USERNAME]/[YOUR-MODEL-NAME]'์ ๋ณธ์ธ์ ํ๊น
ํ์ด์ค ๋ชจ๋ธ ๊ฒฝ๋ก๋ก ๋ณ๊ฒฝํ์ธ์. |
|
|
model_name = "[YOUR-USERNAME]/[YOUR-MODEL-NAME]" |
|
|
classifier = pipeline("text-classification", model=model_name) |
|
|
|
|
|
# ์์ ๋ฌธ์ฅ ํ
์คํธ |
|
|
texts = [ |
|
|
"์ค๋ ๋๋ฌด ๊ธฐ๋ถ ์ข์ ์ผ์ด ์๊ฒผ์ด!", |
|
|
"์ด๊ฑธ ์ด๋ป๊ฒ ํด์ผ ํ ์ง ๋ชจ๋ฅด๊ฒ ๋ค...", |
|
|
"์ง์ง ํ๊ฐ ๋จธ๋ฆฌ ๋๊น์ง ๋๋ค.", |
|
|
"๋ด์ผ ๋ฐํ์ธ๋ฐ ๋๋ฌด ๋จ๋ฆฌ๊ณ ๋ถ์ํด." |
|
|
] |
|
|
|
|
|
# ์์ธก ์ํ |
|
|
results = classifier(texts, top_k=1) |
|
|
|
|
|
for text, result in zip(texts, results): |
|
|
print(f"์
๋ ฅ: {text}") |
|
|
print(f"๊ฐ์ : {result[0]['label']} (Score: {result[0]['score']:.4f})") |
|
|
print("-" * 20) |
|
|
โ๏ธ ํ๋ จ ์์ธ (Training Details) |
|
|
๋ณธ ๋ชจ๋ธ์ train_final_v2.py ์คํฌ๋ฆฝํธ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํ๋ จ๋์์ต๋๋ค. |
|
|
|
|
|
1. ๋ฐ์ดํฐ์
(Dataset) |
|
|
training-label.json: ์๋ณธ ํ๋ จ ๋ฐ์ดํฐ |
|
|
|
|
|
test.json: ์๋ณธ ํ
์คํธ ๋ฐ์ดํฐ |
|
|
|
|
|
๋ฐ์ดํฐ ๋ถ๋ฆฌ (v2 ์ ๋ต): |
|
|
|
|
|
Train Set (90%): training-label.json์ 90% (Stratified Split) |
|
|
|
|
|
Validation Set (10%): training-label.json์ 10% (Stratified Split) |
|
|
|
|
|
Test Set (์ต์ข
ํ๊ฐ์ฉ): test.json (๋ณ๋ ๋ฐ์ดํฐ) |
|
|
|
|
|
2. ํต์ฌ ํ๋ จ ๊ธฐ๋ฒ (Key Techniques) |
|
|
ํด๋์ค ๊ฐ์ค์น (Class Weights): ๋ฐ์ดํฐ ๋ถ๊ท ํ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด CustomTrainer์ CrossEntropyLoss์ weight ํ๋ผ๋ฏธํฐ๋ฅผ ์ฌ์ฉํ์ต๋๋ค. ๊ฐ ํด๋์ค์ ๋ํด ์๋์ผ๋ก ๊ฐ์ค์น๋ฅผ ๋ถ์ฌํ์ฌ ์์ ํด๋์ค์ ์ค์๋๋ฅผ ๋์์ต๋๋ค. |
|
|
|
|
|
์ ์ฉ๋ ๊ฐ์ค์น: [6.00, 4.50, 0.85, 1.80, 1.80, 0.92] |
|
|
|
|
|
๊ฐ์ค์น ์์ (๋ผ๋ฒจ): ['๊ธฐ์จ', '๋นํฉ', '๋ถ๋
ธ', '๋ถ์', '์์ฒ', '์ฌํ'] |
|
|
|
|
|
์ค์ผ์ค๋ฌ (Scheduler): cosine ํ์ต๋ฅ ์ค์ผ์ค๋ฌ๋ฅผ ์ ์ฉํ์ต๋๋ค. |
|
|
3. ์ฃผ์ ํ์ดํผํ๋ผ๋ฏธํฐ (Hyperparameters)HyperparameterValuebase_model_nameklue/roberta-basenum_train_epochs10learning_rate1e-5train_batch_size16eval_batch_size64weight_decay0.01max_length128warmup_ratio0.1lr_scheduler_typecosine |