brabooObrabo's picture
Update README-kr.md
ec254d1 verified
metadata
license: mit
pipeline_tag: text-generation
library_name: mlx
base_model: moonshotai/Kimi-Linear-48B-A3B-Instruct
tags:
  - mlx
  - quantization
  - mxfp4
  - mixture-of-experts

Kimi-Linear-48B-A3B-Instruct ยท MXFP4 ยท 32-Group (MLX)

moonshotai/Kimi-Linear-48B-A3B-Instruct ๊ธฐ๋ณธ ๋ชจ๋ธ์„ mlx-lm 0.28.4
MXFP4 ๊ทธ๋ฃน์‚ฌ์ด์ฆˆ 32 ์„ค์ •์— ๋งž์ถฐ ์–‘์žํ™”ํ•œ ๊ฒฐ๊ณผ๋ฌผ๊ณผ ํ—ˆ๊น…ํŽ˜์ด์Šค ์—…๋กœ๋“œ์šฉ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
๋ชจ๋“  ๊ฐ€์ค‘์น˜์™€ ๋ถ€๊ฐ€ ํŒŒ์ผ์€ Apple Silicon์šฉ MLX ๋Ÿฐํƒ€์ž„๊ณผ ์ฆ‰์‹œ ํ˜ธํ™˜๋ฉ๋‹ˆ๋‹ค.

ํ•œ๊ตญ์–ด | English

๋ชจ๋ธ ์š”์•ฝ

  • ์•„ํ‚คํ…์ฒ˜: config.json์— ์ •์˜๋œ KimiLinear MoE ๋””์ฝ”๋” ์ „์šฉ ํŠธ๋žœ์Šคํฌ๋จธ (27 ๋ ˆ์ด์–ด, ํžˆ๋“  ์‚ฌ์ด์ฆˆ 2304, ์–ดํ…์…˜ ํ—ค๋“œ 32๊ฐœ, ์ „๋ฌธ๊ฐ€ 256๋ช…, ํ† ํฐ๋‹น 8๋ช… ํ™œ์„ฑํ™”).
  • ์ปจํ…์ŠคํŠธ ๊ธธ์ด: ์„ ํ˜• ์–ดํ…์…˜ ๊ธฐ๋ฐ˜ ๋ธ”๋ก์œผ๋กœ ์•ฝ 100๋งŒ ํ† ํฐ ์ˆ˜์ค€๊นŒ์ง€ ํŠœ๋‹๋˜์–ด ์žˆ์œผ๋ฉฐ, ์‹ค์ œ ์œˆ๋„์šฐ๋Š” --max-kv-size ๋“ฑ ๋Ÿฐํƒ€์ž„ ๋ฉ”๋ชจ๋ฆฌ ์„ค์ •์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค.
  • ํ† ํฌ๋‚˜์ด์ €: tiktoken ๊ธฐ๋ฐ˜ BPE (tokenizer_config.json, tiktoken.model)์ด๋ฉฐ ํŠน์ˆ˜ ํ† ํฐ ID๋Š” ํŒŒ์ผ ๋‚ด๋ถ€์— ์ •์˜๋˜์–ด ์žˆ์–ด ์นด๋“œ์—์„œ ํ•˜๋“œ์ฝ”๋”ฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • ์ฑ„ํŒ… ํ…œํ”Œ๋ฆฟ: ๊ณต์‹ Kimi ํˆด ํ˜ธ์ถœ ํ๋ฆ„์„ ๋ฐ˜์˜ํ•œ ๋‹ค์ค‘ ํ„ด ํ…œํ”Œ๋ฆฟ์ด chat_template.jinja์— ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ผ์ด์„ ์Šค: ์—…์ŠคํŠธ๋ฆผ moonshotai/Kimi-Linear-48B-A3B-Instruct์™€ ๋™์ผํ•˜๊ฒŒ MIT.

์–‘์žํ™” ์„ธ๋ถ€ ์ •๋ณด

  • ํˆด๋ง: python3 -m mlx_lm.convert -q (mlx-lm 0.28.4 ์ด์ƒ)์œผ๋กœ MXFP4 ๊ฐ€์ค‘์น˜๋ฅผ ์ƒ์„ฑ.
  • ํฌ๋งท: MXFP4 4๋น„ํŠธ / ๊ทธ๋ฃน์‚ฌ์ด์ฆˆ 32๊ฐ€ ๋ชจ๋“  ์ฃผ์š” ์„ ํ˜• ๊ณ„์ธต์— ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ์˜ˆ์™ธ: Mixture-of-Experts ๊ฒŒ์ดํŠธ ํ”„๋กœ์ ์…˜์€ ๋ผ์šฐํŒ… ์•ˆ์ •์„ฑ์„ ์œ„ํ•ด 8๋น„ํŠธ / ๊ทธ๋ฃน์‚ฌ์ด์ฆˆ 64๋กœ ์œ ์ง€๋˜๋ฉฐ quantization_config ๋‚ด์— ์ „๋ถ€ ๋ช…์‹œ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ƒค๋“œ ๊ตฌ์„ฑ: model-0000n-of-00005.safetensors 5๊ฐœ์™€ model.safetensors.index.json์œผ๋กœ ์ŠคํŠธ๋ฆฌ๋ฐ ๋กœ๋“œ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฉ”๋ชจ๋ฆฌ: Apple Silicon ํ†ตํ•ฉ ๋ฉ”๋ชจ๋ฆฌ ์•ฝ 26~29โ€ฏGB ์ˆ˜์ค€์—์„œ ๊ฐ€์ค‘์น˜๋ฅผ ์ˆ˜์šฉํ•˜๋ฉฐ, KV ์บ์‹œ๋Š” ์ปจํ…์ŠคํŠธ ๊ธธ์ด์— ๋”ฐ๋ผ ์ถ”๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค.

config.json ๋ฏธ๋ฆฌ๋ณด๊ธฐ:

"quantization_config": {
  "group_size": 32,
  "bits": 4,
  "mode": "mxfp4",
  "model.layers.1.mlp.gate": {"group_size": 64, "bits": 8},
  "model.layers.2.mlp.gate": {"group_size": 64, "bits": 8},
  "model.layers.3.mlp.gate": {"group_size": 64, "bits": 8},
  "...": "26๋ฒˆ ๋ ˆ์ด์–ด๊นŒ์ง€ ๋™์ผ ํŒจํ„ด"
}

ํฌํ•จ ํŒŒ์ผ

ํŒŒ์ผ ์šฉ๋„
config.json, generation_config.json, configuration_kimi.py HF ์„ค์ • + ๋งž์ถค MLX Config ํด๋ž˜์Šค
model-0000*-of-00005.safetensors, model.safetensors.index.json ์–‘์žํ™”๋œ MXFP4 ์ƒค๋“œ
modeling_kimi.py KimiLinearForCausalLM ๊ตฌํ˜„
tokenizer_config.json, special_tokens_map.json, tiktoken.model, tokenization_kimi.py ํ† ํฌ๋‚˜์ด์ € ์ž์‚ฐ
chat_template.jinja apply_chat_template์šฉ ํ…œํ”Œ๋ฆฟ
README.md, README-kr.md ์˜๋ฌธ/๊ตญ๋ฌธ ๋ชจ๋ธ ์นด๋“œ

์‚ฌ์šฉ ์˜๋„ ๋ฐ ์ œํ•œ

  • ๊ถŒ์žฅ ์‚ฌ์šฉ์ฒ˜: Apple Silicon ํ™˜๊ฒฝ์—์„œ ๋‹ค๊ตญ์–ด ์–ด์‹œ์Šคํ„ดํŠธ, ํˆด ํ˜ธ์ถœ, ๋กฑ์ปจํ…์ŠคํŠธ RAG.
  • ๋น„๊ถŒ์žฅ ์‚ฌ์šฉ์ฒ˜: ์˜๋ฃŒยท๋ฒ•๋ฅ ยท๊ธˆ์œต ๋“ฑ ๊ฒ€์ฆ์ด ํ•„์š”ํ•œ ๊ฒฐ์ • ํ˜น์€ ํ•„ํ„ฐ๋ง๋˜์ง€ ์•Š์€ ์œ„ํ—˜ ์ง€์‹œ ์ฒ˜๋ฆฌ.
  • ์•ˆ์ „: ๊ธฐ๋ณธ ๋ชจ๋ธ์˜ ์•ˆ์ „ ํ”„๋กœํ•„์„ ๊ทธ๋Œ€๋กœ ๋”ฐ๋ฅด๋ฏ€๋กœ, ์ œํ’ˆ ๋ฐฐํฌ ์‹œ ์ถ”๊ฐ€ ํ•„ํ„ฐ๋ง๊ณผ RLHF ๊ณ„์ธต์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
  • ๋ณด์•ˆ: modeling_kimi.py์— ์ปค์Šคํ…€ ๋ชจ๋“ˆ์ด ์žˆ์œผ๋ฏ€๋กœ CLI ์‹คํ–‰ ์‹œ --trust-remote-code, ํŒŒ์ด์ฌ API์—์„œ๋Š” trust_remote_code=True๋ฅผ ๋ฐ˜๋“œ์‹œ ์‚ฌ์šฉํ•˜๊ณ  ๋ฏผ๊ฐ ๋ฐ์ดํ„ฐ๋Š” ์˜คํ”„๋ผ์ธ/๊ฒฉ๋ฆฌ ํ™˜๊ฒฝ์—์„œ ๋‹ค๋ฃจ์„ธ์š”.

MLX ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

  1. macOS 13.6+ / Apple Silicon์—์„œ MLX ํˆด ์„ค์น˜:

    pip install -U mlx-lm  # ํ˜น์€ main: pip install -U "git+https://github.com/ml-explore/mlx-lm.git@main"
    # ์˜คํ”„๋ผ์ธ ์บ์‹œ ์ „์šฉ:
    # HF_HUB_OFFLINE=1 TRANSFORMERS_OFFLINE=1 ...
    
  2. CLI ์ฑ„ํŒ… (ํ…œํ”Œ๋ฆฟยท์ •์ง€ ๊ทœ์น™ ์ž๋™ ์ ์šฉ):

    mlx_lm.chat \
      --model /path/to/Kimi-Linear-48B-A3B-Instruct-MXFP4-GS32-MLX \
      --trust-remote-code \
      --max-tokens 512 --temperature 0.7 --top-p 0.9
    

    256K ํ† ํฐ ์ด์ƒ ์‹คํ—˜ ์‹œ --max-kv-size 262144(๋˜๋Š” ํ•„์š”์— ๋”ฐ๋ผ ๋” ํฐ ๊ฐ’)์„ ์ถ”๊ฐ€ํ•˜์„ธ์š”.

  3. ํŒŒ์ด์ฌ ์˜ˆ์‹œ:

    from mlx_lm import load, generate
    from mlx_lm.sample_utils import make_sampler, make_logits_processors
    
    model, tok = load(
        "/path/to/Kimi-Linear-48B-A3B-Instruct-MXFP4-GS32-MLX",
        trust_remote_code=True,
    )
    
    messages = [{"role": "user", "content": "Kimi Linear ๊ตฌ์กฐ๋ฅผ ๊ฐ„๋‹จํžˆ ์š”์•ฝํ•ด์ค˜."}]
    prompt = tok.apply_chat_template(messages, add_generation_prompt=True)
    
    sampler = make_sampler(temperature=0.7, top_p=0.9)
    procs = make_logits_processors(repetition_penalty=1.1, repetition_context_size=64)
    
    print(
        generate(
            model,
            tok,
            prompt,
            max_tokens=512,
            sampler=sampler,
            logits_processors=procs,
        )
    )
    

    ํ—ˆ๋ธŒ๋ฅผ ์“ฐ์ง€ ์•Š์•„์•ผ ํ•œ๋‹ค๋ฉด ์‹คํ–‰ ์ „์— HF_HUB_OFFLINE=1, TRANSFORMERS_OFFLINE=1์„ ์„ค์ •ํ•˜์„ธ์š”.

๋ณ€ํ™˜ ๋ฉ”๋ชจ

  • ์›๋ณธ ์ฒดํฌํฌ์ธํŠธ: 2025-11-07 UTC ๊ธฐ์ค€ moonshotai/Kimi-Linear-48B-A3B-Instruct.
  • ์‹ค์ œ ์‚ฌ์šฉ ์ปค๋งจ๋“œ:
    python3 -m mlx_lm.convert \
      --hf-path moonshotai/Kimi-Linear-48B-A3B-Instruct \
      --q-bits 4 -q \
      --group-size 32 \
      -o Kimi-Linear-48B-A3B-Instruct-MXFP4-GS32-MLX
    
  • ์–‘์žํ™” ์ดํ›„ mlx_lm.chat ์ƒŒํ‹ฐํ‹ฐ ๊ฒ€์‚ฌ์™€ safetensors ์ฒดํฌ์„ฌ์œผ๋กœ ๋ฌด๊ฒฐ์„ฑ์„ ๊ฒ€์ฆํ–ˆ์Šต๋‹ˆ๋‹ค.

๋ฌด๊ฒฐ์„ฑ & ๊ฒ€์ฆ

์—…๋กœ๋“œ ํ›„์—๋„ ๋กœ์ปฌ์—์„œ ์ƒค๋“œ ๋ฌด๊ฒฐ์„ฑ์„ ์žฌํ™•์ธํ•˜์„ธ์š”:

cd /path/to/Kimi-Linear-48B-A3B-Instruct-MXFP4-GS32-MLX
shasum -a 256 model-*.safetensors > SHA256SUMS
shasum -c SHA256SUMS

์ถ”๊ฐ€ ํŒ

  • modeling_kimi.py ์ปค์Šคํ…€ ๋ ˆ์ด์–ด๋ฅผ ๋“ฑ๋กํ•˜๋ ค๋ฉด ํ•ญ์ƒ --trust-remote-code๋ฅผ ํฌํ•จํ•˜์„ธ์š”.
  • mlx_lm.cache_prompt์™€ --max-kv-size ์กฐํ•ฉ์„ ํ™œ์šฉํ•˜๋ฉด 1M๊ธ‰ ํ”„๋กฌํ”„ํŠธ์—์„œ๋„ ํ†ตํ•ฉ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์•ˆ์ •์ ์œผ๋กœ ์ œ์–ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฌดํ•œํ•œ ๊ฐ์‚ฌ

  • Moonshot AI โ€” Kimi ํŒจ๋ฐ€๋ฆฌ์™€ Kimi Linear ์•„ํ‚คํ…์ฒ˜ ๊ณต๊ฐœ๋Š” ์–ธ์ œ๋‚˜ ๊ณ ๋ง™์Šต๋‹ˆ๋‹ค. Moonshot AI GitHub, Kimi Linear, ๊ธฐ์ˆ  ๋ฆฌํฌํŠธ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.
  • Apple Machine Learning Research โ€” ์ง€์†์ ์ธ ์—…๋ฐ์ดํŠธ๋ฅผ ํ†ตํ•œ ์ง€์› ๋•๋ถ„์— ์—ด์‹ฌํžˆ ํ•™์Šตํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณ ๋ง™์Šต๋‹ˆ๋‹ค. MLX, MLX-LM.
  • MLX Community โ€” MLX ๊ฐ€์ค‘์น˜์™€ ์˜ˆ์ œ๋ฅผ ์–ธ์ œ๋‚˜ ๋น ๋ฅด๊ฒŒ ๊ณต์œ ํ•ด์ฃผ์–ด ๊ณ ๋ง™์Šต๋‹ˆ๋‹ค. ์–ธ์ œ๋‚˜ ์ฐธ๊ณ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. mlx-community HF.

์ €์™€ ๊ฐ™์ด ๊ฐœ์ธ์œผ๋กœ์จ ์ƒˆ๋กœ์šด ๋„์ „์„ ์ง€์†ํ•˜์‹œ๋Š” ๋ชจ๋“  ํ•œ๊ตญ์ธ๋“ค์„ ์‘์›ํ•ฉ๋‹ˆ๋‹ค. ์ง์ง„ํ•ฉ์‹œ๋‹ค. (ํŽ„๋Ÿญ~)

์ธ์šฉ ์•ˆ๋‚ด

์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด Moonshot AI์™€ ๋ณธ ์–‘์žํ™” ๋ฆด๋ฆฌ์Šค๋ฅผ ๋ฌธ์„œ๋‚˜ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ์— ํ•จ๊ป˜ ์ธ์šฉํ•ด์ฃผ์„ธ์š”.