brabooObrabo's picture
Update README-kr.md
ec254d1 verified
---
license: mit
pipeline_tag: text-generation
library_name: mlx
base_model: moonshotai/Kimi-Linear-48B-A3B-Instruct
tags:
- mlx
- quantization
- mxfp4
- mixture-of-experts
---
<div align="center">
<h1><em><strong>Kimi-Linear-48B-A3B-Instruct ยท MXFP4 ยท 32-Group (MLX)</strong></em></h1>
<strong>moonshotai/Kimi-Linear-48B-A3B-Instruct</strong> ๊ธฐ๋ณธ ๋ชจ๋ธ์„ <code>mlx-lm 0.28.4</code><br>
MXFP4 ๊ทธ๋ฃน์‚ฌ์ด์ฆˆ 32 ์„ค์ •์— ๋งž์ถฐ ์–‘์žํ™”ํ•œ ๊ฒฐ๊ณผ๋ฌผ๊ณผ ํ—ˆ๊น…ํŽ˜์ด์Šค ์—…๋กœ๋“œ์šฉ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.<br>
๋ชจ๋“  ๊ฐ€์ค‘์น˜์™€ ๋ถ€๊ฐ€ ํŒŒ์ผ์€ Apple Silicon์šฉ MLX ๋Ÿฐํƒ€์ž„๊ณผ ์ฆ‰์‹œ ํ˜ธํ™˜๋ฉ๋‹ˆ๋‹ค.
[ํ•œ๊ตญ์–ด](./README-kr.md) | [English](./README.md)
</div>
## ๋ชจ๋ธ ์š”์•ฝ
- **์•„ํ‚คํ…์ฒ˜**: `config.json`์— ์ •์˜๋œ KimiLinear MoE ๋””์ฝ”๋” ์ „์šฉ ํŠธ๋žœ์Šคํฌ๋จธ (27 ๋ ˆ์ด์–ด, ํžˆ๋“  ์‚ฌ์ด์ฆˆ 2304, ์–ดํ…์…˜ ํ—ค๋“œ 32๊ฐœ, ์ „๋ฌธ๊ฐ€ 256๋ช…, ํ† ํฐ๋‹น 8๋ช… ํ™œ์„ฑํ™”).
- **์ปจํ…์ŠคํŠธ ๊ธธ์ด**: ์„ ํ˜• ์–ดํ…์…˜ ๊ธฐ๋ฐ˜ ๋ธ”๋ก์œผ๋กœ ์•ฝ 100๋งŒ ํ† ํฐ ์ˆ˜์ค€๊นŒ์ง€ ํŠœ๋‹๋˜์–ด ์žˆ์œผ๋ฉฐ, ์‹ค์ œ ์œˆ๋„์šฐ๋Š” `--max-kv-size` ๋“ฑ ๋Ÿฐํƒ€์ž„ ๋ฉ”๋ชจ๋ฆฌ ์„ค์ •์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค.
- **ํ† ํฌ๋‚˜์ด์ €**: `tiktoken` ๊ธฐ๋ฐ˜ BPE (`tokenizer_config.json`, `tiktoken.model`)์ด๋ฉฐ ํŠน์ˆ˜ ํ† ํฐ ID๋Š” ํŒŒ์ผ ๋‚ด๋ถ€์— ์ •์˜๋˜์–ด ์žˆ์–ด ์นด๋“œ์—์„œ ํ•˜๋“œ์ฝ”๋”ฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
- **์ฑ„ํŒ… ํ…œํ”Œ๋ฆฟ**: ๊ณต์‹ Kimi ํˆด ํ˜ธ์ถœ ํ๋ฆ„์„ ๋ฐ˜์˜ํ•œ ๋‹ค์ค‘ ํ„ด ํ…œํ”Œ๋ฆฟ์ด `chat_template.jinja`์— ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
- **๋ผ์ด์„ ์Šค**: ์—…์ŠคํŠธ๋ฆผ `moonshotai/Kimi-Linear-48B-A3B-Instruct`์™€ ๋™์ผํ•˜๊ฒŒ MIT.
## ์–‘์žํ™” ์„ธ๋ถ€ ์ •๋ณด
- **ํˆด๋ง**: `python3 -m mlx_lm.convert -q` (mlx-lm 0.28.4 ์ด์ƒ)์œผ๋กœ MXFP4 ๊ฐ€์ค‘์น˜๋ฅผ ์ƒ์„ฑ.
- **ํฌ๋งท**: MXFP4 4๋น„ํŠธ / ๊ทธ๋ฃน์‚ฌ์ด์ฆˆ 32๊ฐ€ ๋ชจ๋“  ์ฃผ์š” ์„ ํ˜• ๊ณ„์ธต์— ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.
- **์˜ˆ์™ธ**: Mixture-of-Experts ๊ฒŒ์ดํŠธ ํ”„๋กœ์ ์…˜์€ ๋ผ์šฐํŒ… ์•ˆ์ •์„ฑ์„ ์œ„ํ•ด 8๋น„ํŠธ / ๊ทธ๋ฃน์‚ฌ์ด์ฆˆ 64๋กœ ์œ ์ง€๋˜๋ฉฐ `quantization_config` ๋‚ด์— ์ „๋ถ€ ๋ช…์‹œ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
- **์ƒค๋“œ ๊ตฌ์„ฑ**: `model-0000n-of-00005.safetensors` 5๊ฐœ์™€ `model.safetensors.index.json`์œผ๋กœ ์ŠคํŠธ๋ฆฌ๋ฐ ๋กœ๋“œ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
- **๋ฉ”๋ชจ๋ฆฌ**: Apple Silicon ํ†ตํ•ฉ ๋ฉ”๋ชจ๋ฆฌ ์•ฝ 26~29โ€ฏGB ์ˆ˜์ค€์—์„œ ๊ฐ€์ค‘์น˜๋ฅผ ์ˆ˜์šฉํ•˜๋ฉฐ, KV ์บ์‹œ๋Š” ์ปจํ…์ŠคํŠธ ๊ธธ์ด์— ๋”ฐ๋ผ ์ถ”๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค.
`config.json` ๋ฏธ๋ฆฌ๋ณด๊ธฐ:
```json
"quantization_config": {
"group_size": 32,
"bits": 4,
"mode": "mxfp4",
"model.layers.1.mlp.gate": {"group_size": 64, "bits": 8},
"model.layers.2.mlp.gate": {"group_size": 64, "bits": 8},
"model.layers.3.mlp.gate": {"group_size": 64, "bits": 8},
"...": "26๋ฒˆ ๋ ˆ์ด์–ด๊นŒ์ง€ ๋™์ผ ํŒจํ„ด"
}
```
## ํฌํ•จ ํŒŒ์ผ
| ํŒŒ์ผ | ์šฉ๋„ |
| -------------------------------------------------------------------------------------------- | -------------------------------- |
| `config.json`, `generation_config.json`, `configuration_kimi.py` | HF ์„ค์ • + ๋งž์ถค MLX Config ํด๋ž˜์Šค |
| `model-0000*-of-00005.safetensors`, `model.safetensors.index.json` | ์–‘์žํ™”๋œ MXFP4 ์ƒค๋“œ |
| `modeling_kimi.py` | `KimiLinearForCausalLM` ๊ตฌํ˜„ |
| `tokenizer_config.json`, `special_tokens_map.json`, `tiktoken.model`, `tokenization_kimi.py` | ํ† ํฌ๋‚˜์ด์ € ์ž์‚ฐ |
| `chat_template.jinja` | `apply_chat_template`์šฉ ํ…œํ”Œ๋ฆฟ |
| `README.md`, `README-kr.md` | ์˜๋ฌธ/๊ตญ๋ฌธ ๋ชจ๋ธ ์นด๋“œ |
## ์‚ฌ์šฉ ์˜๋„ ๋ฐ ์ œํ•œ
- **๊ถŒ์žฅ ์‚ฌ์šฉ์ฒ˜**: Apple Silicon ํ™˜๊ฒฝ์—์„œ ๋‹ค๊ตญ์–ด ์–ด์‹œ์Šคํ„ดํŠธ, ํˆด ํ˜ธ์ถœ, ๋กฑ์ปจํ…์ŠคํŠธ RAG.
- **๋น„๊ถŒ์žฅ ์‚ฌ์šฉ์ฒ˜**: ์˜๋ฃŒยท๋ฒ•๋ฅ ยท๊ธˆ์œต ๋“ฑ ๊ฒ€์ฆ์ด ํ•„์š”ํ•œ ๊ฒฐ์ • ํ˜น์€ ํ•„ํ„ฐ๋ง๋˜์ง€ ์•Š์€ ์œ„ํ—˜ ์ง€์‹œ ์ฒ˜๋ฆฌ.
- **์•ˆ์ „**: ๊ธฐ๋ณธ ๋ชจ๋ธ์˜ ์•ˆ์ „ ํ”„๋กœํ•„์„ ๊ทธ๋Œ€๋กœ ๋”ฐ๋ฅด๋ฏ€๋กœ, ์ œํ’ˆ ๋ฐฐํฌ ์‹œ ์ถ”๊ฐ€ ํ•„ํ„ฐ๋ง๊ณผ RLHF ๊ณ„์ธต์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
- **๋ณด์•ˆ**: `modeling_kimi.py`์— ์ปค์Šคํ…€ ๋ชจ๋“ˆ์ด ์žˆ์œผ๋ฏ€๋กœ CLI ์‹คํ–‰ ์‹œ `--trust-remote-code`, ํŒŒ์ด์ฌ API์—์„œ๋Š” `trust_remote_code=True`๋ฅผ ๋ฐ˜๋“œ์‹œ ์‚ฌ์šฉํ•˜๊ณ  ๋ฏผ๊ฐ ๋ฐ์ดํ„ฐ๋Š” ์˜คํ”„๋ผ์ธ/๊ฒฉ๋ฆฌ ํ™˜๊ฒฝ์—์„œ ๋‹ค๋ฃจ์„ธ์š”.
## MLX ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
1. macOS 13.6+ / Apple Silicon์—์„œ MLX ํˆด ์„ค์น˜:
```bash
pip install -U mlx-lm # ํ˜น์€ main: pip install -U "git+https://github.com/ml-explore/mlx-lm.git@main"
# ์˜คํ”„๋ผ์ธ ์บ์‹œ ์ „์šฉ:
# HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1 ...
```
2. CLI ์ฑ„ํŒ… (ํ…œํ”Œ๋ฆฟยท์ •์ง€ ๊ทœ์น™ ์ž๋™ ์ ์šฉ):
```bash
mlx_lm.chat \
--model /path/to/Kimi-Linear-48B-A3B-Instruct-MXFP4-GS32-MLX \
--trust-remote-code \
--max-tokens 512 --temperature 0.7 --top-p 0.9
```
_256K ํ† ํฐ ์ด์ƒ ์‹คํ—˜ ์‹œ `--max-kv-size 262144`(๋˜๋Š” ํ•„์š”์— ๋”ฐ๋ผ ๋” ํฐ ๊ฐ’)์„ ์ถ”๊ฐ€ํ•˜์„ธ์š”._
3. ํŒŒ์ด์ฌ ์˜ˆ์‹œ:
```python
from mlx_lm import load, generate
from mlx_lm.sample_utils import make_sampler, make_logits_processors
model, tok = load(
"/path/to/Kimi-Linear-48B-A3B-Instruct-MXFP4-GS32-MLX",
trust_remote_code=True,
)
messages = [{"role": "user", "content": "Kimi Linear ๊ตฌ์กฐ๋ฅผ ๊ฐ„๋‹จํžˆ ์š”์•ฝํ•ด์ค˜."}]
prompt = tok.apply_chat_template(messages, add_generation_prompt=True)
sampler = make_sampler(temperature=0.7, top_p=0.9)
procs = make_logits_processors(repetition_penalty=1.1, repetition_context_size=64)
print(
generate(
model,
tok,
prompt,
max_tokens=512,
sampler=sampler,
logits_processors=procs,
)
)
```
_ํ—ˆ๋ธŒ๋ฅผ ์“ฐ์ง€ ์•Š์•„์•ผ ํ•œ๋‹ค๋ฉด ์‹คํ–‰ ์ „์— `HF_HUB_OFFLINE=1`, `TRANSFORMERS_OFFLINE=1`์„ ์„ค์ •ํ•˜์„ธ์š”._
## ๋ณ€ํ™˜ ๋ฉ”๋ชจ
- ์›๋ณธ ์ฒดํฌํฌ์ธํŠธ: 2025-11-07 UTC ๊ธฐ์ค€ `moonshotai/Kimi-Linear-48B-A3B-Instruct`.
- ์‹ค์ œ ์‚ฌ์šฉ ์ปค๋งจ๋“œ:
```bash
python3 -m mlx_lm.convert \
--hf-path moonshotai/Kimi-Linear-48B-A3B-Instruct \
--q-bits 4 -q \
--group-size 32 \
-o Kimi-Linear-48B-A3B-Instruct-MXFP4-GS32-MLX
```
- ์–‘์žํ™” ์ดํ›„ `mlx_lm.chat` ์ƒŒํ‹ฐํ‹ฐ ๊ฒ€์‚ฌ์™€ `safetensors` ์ฒดํฌ์„ฌ์œผ๋กœ ๋ฌด๊ฒฐ์„ฑ์„ ๊ฒ€์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.
## ๋ฌด๊ฒฐ์„ฑ & ๊ฒ€์ฆ
์—…๋กœ๋“œ ํ›„์—๋„ ๋กœ์ปฌ์—์„œ ์ƒค๋“œ ๋ฌด๊ฒฐ์„ฑ์„ ์žฌํ™•์ธํ•˜์„ธ์š”:
```bash
cd /path/to/Kimi-Linear-48B-A3B-Instruct-MXFP4-GS32-MLX
shasum -a 256 model-*.safetensors > SHA256SUMS
shasum -c SHA256SUMS
```
## ์ถ”๊ฐ€ ํŒ
- `modeling_kimi.py` ์ปค์Šคํ…€ ๋ ˆ์ด์–ด๋ฅผ ๋“ฑ๋กํ•˜๋ ค๋ฉด ํ•ญ์ƒ `--trust-remote-code`๋ฅผ ํฌํ•จํ•˜์„ธ์š”.
- `mlx_lm.cache_prompt`์™€ `--max-kv-size` ์กฐํ•ฉ์„ ํ™œ์šฉํ•˜๋ฉด 1M๊ธ‰ ํ”„๋กฌํ”„ํŠธ์—์„œ๋„ ํ†ตํ•ฉ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์•ˆ์ •์ ์œผ๋กœ ์ œ์–ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
## ๋ฌดํ•œํ•œ ๊ฐ์‚ฌ
- **Moonshot AI** โ€” Kimi ํŒจ๋ฐ€๋ฆฌ์™€ Kimi Linear ์•„ํ‚คํ…์ฒ˜ ๊ณต๊ฐœ๋Š” ์–ธ์ œ๋‚˜ ๊ณ ๋ง™์Šต๋‹ˆ๋‹ค. [Moonshot AI GitHub](https://github.com/moonshotai), [Kimi Linear](https://github.com/MoonshotAI/Kimi-Linear), ๊ธฐ์ˆ  ๋ฆฌํฌํŠธ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.
- **Apple Machine Learning Research** โ€” ์ง€์†์ ์ธ ์—…๋ฐ์ดํŠธ๋ฅผ ํ†ตํ•œ ์ง€์› ๋•๋ถ„์— ์—ด์‹ฌํžˆ ํ•™์Šตํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณ ๋ง™์Šต๋‹ˆ๋‹ค. [MLX](https://github.com/ml-explore/mlx), [MLX-LM](https://github.com/ml-explore/mlx-lm).
- **MLX Community** โ€” MLX ๊ฐ€์ค‘์น˜์™€ ์˜ˆ์ œ๋ฅผ ์–ธ์ œ๋‚˜ ๋น ๋ฅด๊ฒŒ ๊ณต์œ ํ•ด์ฃผ์–ด ๊ณ ๋ง™์Šต๋‹ˆ๋‹ค. ์–ธ์ œ๋‚˜ ์ฐธ๊ณ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. [mlx-community HF](https://huggingface.co/mlx-community).
_์ €์™€ ๊ฐ™์ด ๊ฐœ์ธ์œผ๋กœ์จ ์ƒˆ๋กœ์šด ๋„์ „์„ ์ง€์†ํ•˜์‹œ๋Š” ๋ชจ๋“  ํ•œ๊ตญ์ธ๋“ค์„ ์‘์›ํ•ฉ๋‹ˆ๋‹ค. ์ง์ง„ํ•ฉ์‹œ๋‹ค. (ํŽ„๋Ÿญ~)_
## ์ธ์šฉ ์•ˆ๋‚ด
์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด Moonshot AI์™€ ๋ณธ ์–‘์žํ™” ๋ฆด๋ฆฌ์Šค๋ฅผ ๋ฌธ์„œ๋‚˜ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ์— ํ•จ๊ป˜ ์ธ์šฉํ•ด์ฃผ์„ธ์š”.