File size: 7,961 Bytes
9011469
 
 
 
 
 
 
 
 
 
 
0c11613
 
 
 
 
 
 
9011469
0c11613
9011469
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ec254d1
9011469
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
license: mit
pipeline_tag: text-generation
library_name: mlx
base_model: moonshotai/Kimi-Linear-48B-A3B-Instruct
tags:
  - mlx
  - quantization
  - mxfp4
  - mixture-of-experts
---
<div align="center">
  <h1><em><strong>Kimi-Linear-48B-A3B-Instruct ยท MXFP4 ยท 32-Group (MLX)</strong></em></h1>
  <strong>moonshotai/Kimi-Linear-48B-A3B-Instruct</strong> ๊ธฐ๋ณธ ๋ชจ๋ธ์„ <code>mlx-lm 0.28.4</code><br>
  MXFP4 ๊ทธ๋ฃน์‚ฌ์ด์ฆˆ 32 ์„ค์ •์— ๋งž์ถฐ ์–‘์žํ™”ํ•œ ๊ฒฐ๊ณผ๋ฌผ๊ณผ ํ—ˆ๊น…ํŽ˜์ด์Šค ์—…๋กœ๋“œ์šฉ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.<br>
  ๋ชจ๋“  ๊ฐ€์ค‘์น˜์™€ ๋ถ€๊ฐ€ ํŒŒ์ผ์€ Apple Silicon์šฉ MLX ๋Ÿฐํƒ€์ž„๊ณผ ์ฆ‰์‹œ ํ˜ธํ™˜๋ฉ๋‹ˆ๋‹ค.
  
  [ํ•œ๊ตญ์–ด](./README-kr.md) | [English](./README.md)

</div>

## ๋ชจ๋ธ ์š”์•ฝ

- **์•„ํ‚คํ…์ฒ˜**: `config.json`์— ์ •์˜๋œ KimiLinear MoE ๋””์ฝ”๋” ์ „์šฉ ํŠธ๋žœ์Šคํฌ๋จธ (27 ๋ ˆ์ด์–ด, ํžˆ๋“  ์‚ฌ์ด์ฆˆ 2304, ์–ดํ…์…˜ ํ—ค๋“œ 32๊ฐœ, ์ „๋ฌธ๊ฐ€ 256๋ช…, ํ† ํฐ๋‹น 8๋ช… ํ™œ์„ฑํ™”).
- **์ปจํ…์ŠคํŠธ ๊ธธ์ด**: ์„ ํ˜• ์–ดํ…์…˜ ๊ธฐ๋ฐ˜ ๋ธ”๋ก์œผ๋กœ ์•ฝ 100๋งŒ ํ† ํฐ ์ˆ˜์ค€๊นŒ์ง€ ํŠœ๋‹๋˜์–ด ์žˆ์œผ๋ฉฐ, ์‹ค์ œ ์œˆ๋„์šฐ๋Š” `--max-kv-size` ๋“ฑ ๋Ÿฐํƒ€์ž„ ๋ฉ”๋ชจ๋ฆฌ ์„ค์ •์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค.
- **ํ† ํฌ๋‚˜์ด์ €**: `tiktoken` ๊ธฐ๋ฐ˜ BPE (`tokenizer_config.json`, `tiktoken.model`)์ด๋ฉฐ ํŠน์ˆ˜ ํ† ํฐ ID๋Š” ํŒŒ์ผ ๋‚ด๋ถ€์— ์ •์˜๋˜์–ด ์žˆ์–ด ์นด๋“œ์—์„œ ํ•˜๋“œ์ฝ”๋”ฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
- **์ฑ„ํŒ… ํ…œํ”Œ๋ฆฟ**: ๊ณต์‹ Kimi ํˆด ํ˜ธ์ถœ ํ๋ฆ„์„ ๋ฐ˜์˜ํ•œ ๋‹ค์ค‘ ํ„ด ํ…œํ”Œ๋ฆฟ์ด `chat_template.jinja`์— ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
- **๋ผ์ด์„ ์Šค**: ์—…์ŠคํŠธ๋ฆผ `moonshotai/Kimi-Linear-48B-A3B-Instruct`์™€ ๋™์ผํ•˜๊ฒŒ MIT.

## ์–‘์žํ™” ์„ธ๋ถ€ ์ •๋ณด

- **ํˆด๋ง**: `python3 -m mlx_lm.convert -q` (mlx-lm 0.28.4 ์ด์ƒ)์œผ๋กœ MXFP4 ๊ฐ€์ค‘์น˜๋ฅผ ์ƒ์„ฑ.
- **ํฌ๋งท**: MXFP4 4๋น„ํŠธ / ๊ทธ๋ฃน์‚ฌ์ด์ฆˆ 32๊ฐ€ ๋ชจ๋“  ์ฃผ์š” ์„ ํ˜• ๊ณ„์ธต์— ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.
- **์˜ˆ์™ธ**: Mixture-of-Experts ๊ฒŒ์ดํŠธ ํ”„๋กœ์ ์…˜์€ ๋ผ์šฐํŒ… ์•ˆ์ •์„ฑ์„ ์œ„ํ•ด 8๋น„ํŠธ / ๊ทธ๋ฃน์‚ฌ์ด์ฆˆ 64๋กœ ์œ ์ง€๋˜๋ฉฐ `quantization_config` ๋‚ด์— ์ „๋ถ€ ๋ช…์‹œ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
- **์ƒค๋“œ ๊ตฌ์„ฑ**: `model-0000n-of-00005.safetensors` 5๊ฐœ์™€ `model.safetensors.index.json`์œผ๋กœ ์ŠคํŠธ๋ฆฌ๋ฐ ๋กœ๋“œ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
- **๋ฉ”๋ชจ๋ฆฌ**: Apple Silicon ํ†ตํ•ฉ ๋ฉ”๋ชจ๋ฆฌ ์•ฝ 26~29โ€ฏGB ์ˆ˜์ค€์—์„œ ๊ฐ€์ค‘์น˜๋ฅผ ์ˆ˜์šฉํ•˜๋ฉฐ, KV ์บ์‹œ๋Š” ์ปจํ…์ŠคํŠธ ๊ธธ์ด์— ๋”ฐ๋ผ ์ถ”๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค.

`config.json` ๋ฏธ๋ฆฌ๋ณด๊ธฐ:

```json
"quantization_config": {
  "group_size": 32,
  "bits": 4,
  "mode": "mxfp4",
  "model.layers.1.mlp.gate": {"group_size": 64, "bits": 8},
  "model.layers.2.mlp.gate": {"group_size": 64, "bits": 8},
  "model.layers.3.mlp.gate": {"group_size": 64, "bits": 8},
  "...": "26๋ฒˆ ๋ ˆ์ด์–ด๊นŒ์ง€ ๋™์ผ ํŒจํ„ด"
}
```

## ํฌํ•จ ํŒŒ์ผ

| ํŒŒ์ผ                                                                                         | ์šฉ๋„                             |
| -------------------------------------------------------------------------------------------- | -------------------------------- |
| `config.json`, `generation_config.json`, `configuration_kimi.py`                             | HF ์„ค์ • + ๋งž์ถค MLX Config ํด๋ž˜์Šค |
| `model-0000*-of-00005.safetensors`, `model.safetensors.index.json`                           | ์–‘์žํ™”๋œ MXFP4 ์ƒค๋“œ              |
| `modeling_kimi.py`                                                                           | `KimiLinearForCausalLM` ๊ตฌํ˜„     |
| `tokenizer_config.json`, `special_tokens_map.json`, `tiktoken.model`, `tokenization_kimi.py` | ํ† ํฌ๋‚˜์ด์ € ์ž์‚ฐ                  |
| `chat_template.jinja`                                                                        | `apply_chat_template`์šฉ ํ…œํ”Œ๋ฆฟ   |
| `README.md`, `README-kr.md`                                                                  | ์˜๋ฌธ/๊ตญ๋ฌธ ๋ชจ๋ธ ์นด๋“œ              |

## ์‚ฌ์šฉ ์˜๋„ ๋ฐ ์ œํ•œ

- **๊ถŒ์žฅ ์‚ฌ์šฉ์ฒ˜**: Apple Silicon ํ™˜๊ฒฝ์—์„œ ๋‹ค๊ตญ์–ด ์–ด์‹œ์Šคํ„ดํŠธ, ํˆด ํ˜ธ์ถœ, ๋กฑ์ปจํ…์ŠคํŠธ RAG.
- **๋น„๊ถŒ์žฅ ์‚ฌ์šฉ์ฒ˜**: ์˜๋ฃŒยท๋ฒ•๋ฅ ยท๊ธˆ์œต ๋“ฑ ๊ฒ€์ฆ์ด ํ•„์š”ํ•œ ๊ฒฐ์ • ํ˜น์€ ํ•„ํ„ฐ๋ง๋˜์ง€ ์•Š์€ ์œ„ํ—˜ ์ง€์‹œ ์ฒ˜๋ฆฌ.
- **์•ˆ์ „**: ๊ธฐ๋ณธ ๋ชจ๋ธ์˜ ์•ˆ์ „ ํ”„๋กœํ•„์„ ๊ทธ๋Œ€๋กœ ๋”ฐ๋ฅด๋ฏ€๋กœ, ์ œํ’ˆ ๋ฐฐํฌ ์‹œ ์ถ”๊ฐ€ ํ•„ํ„ฐ๋ง๊ณผ RLHF ๊ณ„์ธต์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
- **๋ณด์•ˆ**: `modeling_kimi.py`์— ์ปค์Šคํ…€ ๋ชจ๋“ˆ์ด ์žˆ์œผ๋ฏ€๋กœ CLI ์‹คํ–‰ ์‹œ `--trust-remote-code`, ํŒŒ์ด์ฌ API์—์„œ๋Š” `trust_remote_code=True`๋ฅผ ๋ฐ˜๋“œ์‹œ ์‚ฌ์šฉํ•˜๊ณ  ๋ฏผ๊ฐ ๋ฐ์ดํ„ฐ๋Š” ์˜คํ”„๋ผ์ธ/๊ฒฉ๋ฆฌ ํ™˜๊ฒฝ์—์„œ ๋‹ค๋ฃจ์„ธ์š”.

## MLX ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

1. macOS 13.6+ / Apple Silicon์—์„œ MLX ํˆด ์„ค์น˜:
   ```bash
   pip install -U mlx-lm  # ํ˜น์€ main: pip install -U "git+https://github.com/ml-explore/mlx-lm.git@main"
   # ์˜คํ”„๋ผ์ธ ์บ์‹œ ์ „์šฉ:
   # HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1 ...
   ```
2. CLI ์ฑ„ํŒ… (ํ…œํ”Œ๋ฆฟยท์ •์ง€ ๊ทœ์น™ ์ž๋™ ์ ์šฉ):
   ```bash
   mlx_lm.chat \
     --model /path/to/Kimi-Linear-48B-A3B-Instruct-MXFP4-GS32-MLX \
     --trust-remote-code \
     --max-tokens 512 --temperature 0.7 --top-p 0.9
   ```
   _256K ํ† ํฐ ์ด์ƒ ์‹คํ—˜ ์‹œ `--max-kv-size 262144`(๋˜๋Š” ํ•„์š”์— ๋”ฐ๋ผ ๋” ํฐ ๊ฐ’)์„ ์ถ”๊ฐ€ํ•˜์„ธ์š”._
3. ํŒŒ์ด์ฌ ์˜ˆ์‹œ:

   ```python
   from mlx_lm import load, generate
   from mlx_lm.sample_utils import make_sampler, make_logits_processors

   model, tok = load(
       "/path/to/Kimi-Linear-48B-A3B-Instruct-MXFP4-GS32-MLX",
       trust_remote_code=True,
   )

   messages = [{"role": "user", "content": "Kimi Linear ๊ตฌ์กฐ๋ฅผ ๊ฐ„๋‹จํžˆ ์š”์•ฝํ•ด์ค˜."}]
   prompt = tok.apply_chat_template(messages, add_generation_prompt=True)

   sampler = make_sampler(temperature=0.7, top_p=0.9)
   procs = make_logits_processors(repetition_penalty=1.1, repetition_context_size=64)

   print(
       generate(
           model,
           tok,
           prompt,
           max_tokens=512,
           sampler=sampler,
           logits_processors=procs,
       )
   )
   ```

   _ํ—ˆ๋ธŒ๋ฅผ ์“ฐ์ง€ ์•Š์•„์•ผ ํ•œ๋‹ค๋ฉด ์‹คํ–‰ ์ „์— `HF_HUB_OFFLINE=1`, `TRANSFORMERS_OFFLINE=1`์„ ์„ค์ •ํ•˜์„ธ์š”._

## ๋ณ€ํ™˜ ๋ฉ”๋ชจ

- ์›๋ณธ ์ฒดํฌํฌ์ธํŠธ: 2025-11-07 UTC ๊ธฐ์ค€ `moonshotai/Kimi-Linear-48B-A3B-Instruct`.
- ์‹ค์ œ ์‚ฌ์šฉ ์ปค๋งจ๋“œ:
  ```bash
  python3 -m mlx_lm.convert \
    --hf-path moonshotai/Kimi-Linear-48B-A3B-Instruct \
    --q-bits 4 -q \
    --group-size 32 \
    -o Kimi-Linear-48B-A3B-Instruct-MXFP4-GS32-MLX
  ```
- ์–‘์žํ™” ์ดํ›„ `mlx_lm.chat` ์ƒŒํ‹ฐํ‹ฐ ๊ฒ€์‚ฌ์™€ `safetensors` ์ฒดํฌ์„ฌ์œผ๋กœ ๋ฌด๊ฒฐ์„ฑ์„ ๊ฒ€์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

## ๋ฌด๊ฒฐ์„ฑ & ๊ฒ€์ฆ

์—…๋กœ๋“œ ํ›„์—๋„ ๋กœ์ปฌ์—์„œ ์ƒค๋“œ ๋ฌด๊ฒฐ์„ฑ์„ ์žฌํ™•์ธํ•˜์„ธ์š”:

```bash
cd /path/to/Kimi-Linear-48B-A3B-Instruct-MXFP4-GS32-MLX
shasum -a 256 model-*.safetensors > SHA256SUMS
shasum -c SHA256SUMS
```

## ์ถ”๊ฐ€ ํŒ

- `modeling_kimi.py` ์ปค์Šคํ…€ ๋ ˆ์ด์–ด๋ฅผ ๋“ฑ๋กํ•˜๋ ค๋ฉด ํ•ญ์ƒ `--trust-remote-code`๋ฅผ ํฌํ•จํ•˜์„ธ์š”.
- `mlx_lm.cache_prompt`์™€ `--max-kv-size` ์กฐํ•ฉ์„ ํ™œ์šฉํ•˜๋ฉด 1M๊ธ‰ ํ”„๋กฌํ”„ํŠธ์—์„œ๋„ ํ†ตํ•ฉ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์•ˆ์ •์ ์œผ๋กœ ์ œ์–ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

## ๋ฌดํ•œํ•œ ๊ฐ์‚ฌ

- **Moonshot AI** โ€” Kimi ํŒจ๋ฐ€๋ฆฌ์™€ Kimi Linear ์•„ํ‚คํ…์ฒ˜ ๊ณต๊ฐœ๋Š” ์–ธ์ œ๋‚˜ ๊ณ ๋ง™์Šต๋‹ˆ๋‹ค. [Moonshot AI GitHub](https://github.com/moonshotai), [Kimi Linear](https://github.com/MoonshotAI/Kimi-Linear), ๊ธฐ์ˆ  ๋ฆฌํฌํŠธ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.
- **Apple Machine Learning Research** โ€” ์ง€์†์ ์ธ ์—…๋ฐ์ดํŠธ๋ฅผ ํ†ตํ•œ ์ง€์› ๋•๋ถ„์— ์—ด์‹ฌํžˆ ํ•™์Šตํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณ ๋ง™์Šต๋‹ˆ๋‹ค. [MLX](https://github.com/ml-explore/mlx), [MLX-LM](https://github.com/ml-explore/mlx-lm).
- **MLX Community** โ€” MLX ๊ฐ€์ค‘์น˜์™€ ์˜ˆ์ œ๋ฅผ ์–ธ์ œ๋‚˜ ๋น ๋ฅด๊ฒŒ ๊ณต์œ ํ•ด์ฃผ์–ด ๊ณ ๋ง™์Šต๋‹ˆ๋‹ค. ์–ธ์ œ๋‚˜ ์ฐธ๊ณ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. [mlx-community HF](https://huggingface.co/mlx-community).

_์ €์™€ ๊ฐ™์ด ๊ฐœ์ธ์œผ๋กœ์จ ์ƒˆ๋กœ์šด ๋„์ „์„ ์ง€์†ํ•˜์‹œ๋Š” ๋ชจ๋“  ํ•œ๊ตญ์ธ๋“ค์„ ์‘์›ํ•ฉ๋‹ˆ๋‹ค. ์ง์ง„ํ•ฉ์‹œ๋‹ค. (ํŽ„๋Ÿญ~)_

## ์ธ์šฉ ์•ˆ๋‚ด

์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด Moonshot AI์™€ ๋ณธ ์–‘์žํ™” ๋ฆด๋ฆฌ์Šค๋ฅผ ๋ฌธ์„œ๋‚˜ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ์— ํ•จ๊ป˜ ์ธ์šฉํ•ด์ฃผ์„ธ์š”.