File size: 4,146 Bytes
e904a75
db5bf1d
 
e904a75
db5bf1d
e904a75
db5bf1d
 
 
e904a75
 
db5bf1d
 
 
e904a75
 
 
db5bf1d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e904a75
 
db5bf1d
e904a75
db5bf1d
e904a75
db5bf1d
 
 
e904a75
db5bf1d
e904a75
db5bf1d
e904a75
db5bf1d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e904a75
db5bf1d
e904a75
db5bf1d
e904a75
db5bf1d
e904a75
db5bf1d
c184e7d
db5bf1d
 
c184e7d
db5bf1d
c184e7d
db5bf1d
c184e7d
5eb383b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
# (ํ•„์ˆ˜) Hugging Face ๋ชจ๋ธ ์นด๋“œ์šฉ YAML ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ
# TODO: language, tags, dataset, metrics๋ฅผ ๋ณธ์ธ ์ƒํ™ฉ์— ๋งž๊ฒŒ ์ˆ˜์ •ํ•˜์„ธ์š”.
language: ko
license: other # (๋ผ์ด์„ ์Šค๋ฅผ ์„ ํƒํ•˜์„ธ์š”: apache-2.0, mit ๋“ฑ)
tags:
- text-classification
- korean
- emotion-analysis
- klue
- roberta
pipeline_tag: text-classification
datasets:
- custom-korean-emotion-dataset # (๋ฐ์ดํ„ฐ์…‹ ์ด๋ฆ„์„ ์ง€์ •ํ•˜์„ธ์š”)
metrics:
- accuracy
- f1
model-index:
- name: 6-Class Korean Emotion Analysis
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      name: Custom Test Set
      type: custom-korean-emotion-dataset
      config: default
      split: test
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.7905
    - name: F1 (Weighted)
      type: f1
      value: 0.7910
    - name: Loss
      type: loss
      value: 0.6943
---

# 6-Class ํ•œ๊ตญ์–ด ๊ฐ์ • ๋ถ„์„ ๋ชจ๋ธ (v2)

๋ณธ ๋ชจ๋ธ์€ [klue/roberta-base](https://huggingface.co/klue/roberta-base)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜์—ฌ, ํ•œ๊ตญ์–ด ํ…์ŠคํŠธ์˜ ๊ฐ์ •์„ 6๊ฐ€์ง€ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ํ…์ŠคํŠธ ๋ถ„๋ฅ˜(Sequence Classification) ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

**์ฃผ์š” ํŠน์ง•:**
* **6-Class ๋ถ„๋ฅ˜:** '๊ธฐ์จ', '๋‹นํ™ฉ', '๋ถ„๋…ธ', '๋ถˆ์•ˆ', '์ƒ์ฒ˜', '์Šฌํ””'์˜ 6๊ฐ€์ง€ ๊ฐ์ •์œผ๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.
* **๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ:** `CrossEntropyLoss`์— ์ˆ˜๋™์œผ๋กœ **ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜(Class Weights)**๋ฅผ ์ ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ณ , ์†Œ์ˆ˜ ํด๋ž˜์Šค(๊ธฐ์จ, ๋‹นํ™ฉ ๋“ฑ)์˜ ํƒ์ง€ ์„ฑ๋Šฅ์„ ๋†’์˜€์Šต๋‹ˆ๋‹ค.

## ๐Ÿ—‚ ๋ชจ๋ธ ๋ผ๋ฒจ (Labels)

๋ชจ๋ธ์˜ ์ถœ๋ ฅ์€ 6๊ฐ€์ง€ ๊ฐ์ • ํด๋ž˜์Šค์— ํ•ด๋‹นํ•˜๋ฉฐ, ๋ผ๋ฒจ๊ณผ ID๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

| Label (๊ฐ์ •) | ID |
| :--- | :--: |
| `๊ธฐ์จ` | 0 |
| `๋‹นํ™ฉ` | 1 |
| `๋ถ„๋…ธ` | 2 |
| `๋ถˆ์•ˆ` | 3 |
| `์ƒ์ฒ˜` | 4 |
| `์Šฌํ””` | 5 |

*(์ฐธ๊ณ : ๋ผ๋ฒจ ์ˆœ์„œ๋Š” ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹(df_train) ๊ธฐ์ค€์œผ๋กœ ์ž๋™ ์ƒ์„ฑ๋œ `['๊ธฐ์จ', '๋‹นํ™ฉ', '๋ถ„๋…ธ', '๋ถˆ์•ˆ', '์ƒ์ฒ˜', '์Šฌํ””']` ์ˆœ์„œ๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.)*

## ๐Ÿš€ ์‚ฌ์šฉ ๋ฐฉ๋ฒ• (How to Use)

`transformers` ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ `pipeline`์„ ์‚ฌ์šฉํ•˜์—ฌ ์‰ฝ๊ฒŒ ๋ชจ๋ธ์„ ํ…Œ์ŠคํŠธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

```python
from transformers import pipeline

# TODO: '[YOUR-USERNAME]/[YOUR-MODEL-NAME]'์„ ๋ณธ์ธ์˜ ํ—ˆ๊น…ํŽ˜์ด์Šค ๋ชจ๋ธ ๊ฒฝ๋กœ๋กœ ๋ณ€๊ฒฝํ•˜์„ธ์š”.
model_name = "[YOUR-USERNAME]/[YOUR-MODEL-NAME]" 
classifier = pipeline("text-classification", model=model_name)

# ์˜ˆ์‹œ ๋ฌธ์žฅ ํ…Œ์ŠคํŠธ
texts = [
    "์˜ค๋Š˜ ๋„ˆ๋ฌด ๊ธฐ๋ถ„ ์ข‹์€ ์ผ์ด ์ƒ๊ฒผ์–ด!",
    "์ด๊ฑธ ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ• ์ง€ ๋ชจ๋ฅด๊ฒ ๋„ค...",
    "์ง„์งœ ํ™”๊ฐ€ ๋จธ๋ฆฌ ๋๊นŒ์ง€ ๋‚œ๋‹ค.",
    "๋‚ด์ผ ๋ฐœํ‘œ์ธ๋ฐ ๋„ˆ๋ฌด ๋–จ๋ฆฌ๊ณ  ๋ถˆ์•ˆํ•ด."
]

# ์˜ˆ์ธก ์ˆ˜ํ–‰
results = classifier(texts, top_k=1)

for text, result in zip(texts, results):
    print(f"์ž…๋ ฅ: {text}")
    print(f"๊ฐ์ •: {result[0]['label']} (Score: {result[0]['score']:.4f})")
    print("-" * 20)
โš™๏ธ ํ›ˆ๋ จ ์ƒ์„ธ (Training Details)
๋ณธ ๋ชจ๋ธ์€ train_final_v2.py ์Šคํฌ๋ฆฝํŠธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ›ˆ๋ จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

1. ๋ฐ์ดํ„ฐ์…‹ (Dataset)
training-label.json: ์›๋ณธ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ

test.json: ์›๋ณธ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ

๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ (v2 ์ „๋žต):

Train Set (90%): training-label.json์˜ 90% (Stratified Split)

Validation Set (10%): training-label.json์˜ 10% (Stratified Split)

Test Set (์ตœ์ข… ํ‰๊ฐ€์šฉ): test.json (๋ณ„๋„ ๋ฐ์ดํ„ฐ)

2. ํ•ต์‹ฌ ํ›ˆ๋ จ ๊ธฐ๋ฒ• (Key Techniques)
ํด๋ž˜์Šค ๊ฐ€์ค‘์น˜ (Class Weights): ๋ฐ์ดํ„ฐ ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด CustomTrainer์™€ CrossEntropyLoss์˜ weight ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•ด ์ˆ˜๋™์œผ๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•˜์—ฌ ์†Œ์ˆ˜ ํด๋ž˜์Šค์˜ ์ค‘์š”๋„๋ฅผ ๋†’์˜€์Šต๋‹ˆ๋‹ค.

์ ์šฉ๋œ ๊ฐ€์ค‘์น˜: [6.00, 4.50, 0.85, 1.80, 1.80, 0.92]

๊ฐ€์ค‘์น˜ ์ˆœ์„œ (๋ผ๋ฒจ): ['๊ธฐ์จ', '๋‹นํ™ฉ', '๋ถ„๋…ธ', '๋ถˆ์•ˆ', '์ƒ์ฒ˜', '์Šฌํ””']

์Šค์ผ€์ค„๋Ÿฌ (Scheduler): cosine ํ•™์Šต๋ฅ  ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
3. ์ฃผ์š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ (Hyperparameters)HyperparameterValuebase_model_nameklue/roberta-basenum_train_epochs10learning_rate1e-5train_batch_size16eval_batch_size64weight_decay0.01max_length128warmup_ratio0.1lr_scheduler_typecosine