Kanana

๐Ÿค— Kanana-2 Models   |   ๐Ÿ“• Kanana-2 Blog  



Kanana-2 Hightlights

Kanana-2, the latest open-source evolution of the Kanana model family, is designed specifically for Agentic AI, presenting substantial enhancements in tool calling, complex instruction following, and logical reasoning. This new version adopts a cutting-edge architecture featuring MLA (Multi-head Latent Attention) and MoE (Mixture of Experts). These innovations allow the model to utilize significantly fewer active parameters compared to the previous 32.5B model while delivering superior performance and ensuring high throughput. Furthermore, the model natively supports context lengths of up to 32,768 tokens, enabling it to maintain coherence when handling extensive documents or long-context interactions.

In addition, Kanana-2 now supports 6 languages, covering Korean, English, Japanese, Chinese, Thai, and Vietnamese. To support this expansion, Kanana-2 utilizes a newly trained tokenizer that demonstrates superior tokenization efficiency across these languages, including an improvement of over 30% specifically for Korean. Finally, to address advanced problem-solving needs, Kanana-2 introduces reasoning models capable of deliberate thinking and reasoning, achieving significantly enhanced performance in downstream tasks, especially when tackling hard problems.

No Kakao user data was used for either pre-training or post-training.


Model Overview

kanana-2-30b-a3b series has the following features:

  • Total Parameters: 30B
  • Activated Parameters: 3B
  • Number of Layers: 48
  • Number of Dense Layers: 1
  • Number of Experts: 128
  • Number of Selected Experts: 6
  • Number of Shared Experts: 2
  • Attention Mechanism: MLA
  • Vocabulary Size: 128256
  • Context Length: 32,768

Model Downloads

Model Download
kanana-2-30b-a3b-base ๐Ÿค— HuggingFace
kanana-2-30b-a3b-instruct ๐Ÿค— HuggingFace
kanana-2-30b-a3b-thinking ๐Ÿค— HuggingFace

Performance

Base model evaluation results

Benchmark Metric Shot kanana-2-30b-a3b-base kanana-1.5-32.5b-base Qwen3-30B-A3B-Base*
General Tasks
MMLU acc 5 75.44 76.76 81.14
MMLU-Pro acc 5 56.14 52.40 61.83
BBH acc 3 79.76 81.54 79.97
SimpleQAโ€  acc 5 29.70 26.95 26.47
Mathematics Tasks
MATH em 4 54.40 47.68 62.58
GSM8K em 8 82.71 85.14 88.10
Coding Tasks
HumanEval pass@1 0 75.29 75.59 53.32
MBPP pass@1 3 62.39 65.96 72.58
Korean Tasks
KMMLU acc 5 62.15 61.56 62.25
KoSimpleQAโ€  acc 5 49.40 45.70 26.33
HAE-RAE Bench (v1.0) acc 5 88.73 90.65 72.04
MATH-Koโ€ก em 4 54.07 47.42 58.20
GSM8K-Koโ€ก em 8 77.48 81.43 88.10
MBPP-Koยง pass@1 3 61.55 65.41 66.84
Long Context Tasks
RULER-4K acc 0 93.09 86.39 94.32
RULER-8K acc 0 92.29 90.16 92.16
RULER-16K acc 0 90.73 85.88 91.28
RULER-32K acc 0 88.63 81.62 88.32
* Evaluated using an internal evaluation toolkit.
โ€  Evaluated in Multiple Choice Question Answering (MCQA) format with 10 options.
โ€ก Subsets from HRM8K (MATH, GSM8K).
ยง Internally translated to Korean.

Instruct model evaluation results

Benchmark Metric kanana-2-30b-a3b-instruct kanana-1.5-32.5b-instruct Qwen3-30B-A3B-Instruct-2507* Qwen3-30B-A3B
(non-thinking)*
Chat
MT-Bench judgeโ€  8.42 8.23 8.71 8.38
KoMT-Bench judgeโ€  8.24 7.94 8.49 7.89
Instruction Following
IFEval prompt strict 84.47 79.48 82.62 84.10
IFBench prompt strict 41.84 38.78 30.27 29.25
Multi-IF (EN) acc 75.81 68.51 77.93 81.03
Multi-Challenge acc 34.80 19.05 41.76 27.84
Tool Calling
BFCL-v3
(Liveโ€ก)
pass@1 74.30 68.74 73.93 69.14
BFCL-v3
(Multi-Turnโ€ก)
pass@1 35.38 11.38 38.77 11.88
Code Generation
HumanEval+ pass@1 79.88 79.88 86.59 87.20
MBPP+ pass@1 73.81 71.96 75.13 75.13
Mathematics
GSM8K em 91.89 91.58 93.56 93.33
MATH acc 86.26 77.92 90.96 87.20
Reasoning & Knowledge
MMLU em 80.80 82.75 87.13 85.60
KMMLU em 67.32 65.75 67.56 63.49
GPQA Diamond pass@1 42.93 42.42 54.55 50.51
HAERAE-Bench (v1.0) em 75.57 65.34 53.41 57.39
* Evaluated using an internal evaluation toolkit.
โ€  Evaluated using gpt-4o-2024-08-06 as the judge model.
โ€ก Live denotes the average score of 6 live benchmarks, and Multi-Turn denotes the average score of 4 multi-turn benchmarks.

Reasoning model evaluation results

Benchmark Metric kanana-2-30b-a3b-thinking Qwen3-30B-A3B-Thinking-2507* Qwen3-30B-A3B
(thinking)*
Reasoning & Knowledge
MMLU-Pro pass@1 75.3 80.8 78.5
GPQA Diamond pass@1 61.3 70.6 62.6
Competition Math
AIME 2025 pass@1 72.7 82.3 70.7
AIME 2024 pass@1 78.3 91.0 82.7
Code Generation
LiveCodeBench pass@1 60.8 68.3 62.3
Instruction Following
IFEval prompt strict 82.2 87.8 86.1
IFBench prompt strict 42.3 47.6 36.7
Tool Calling
BFCL-v3
(Liveโ€ )
pass@1 78.1 85.9 82.3
BFCL-v3
(Multi-Turnโ€ )
pass@1 33.4 47.8 28.2
* Evaluated using an internal evaluation toolkit.
โ€  Live denotes the average score of 6 live benchmarks, and Multi-Turn denotes the average score of 4 multi-turn benchmarks.

Quickstart

You should install the transformers library with version >= 4.51.0.

The following contains a code snippet illustrating how to use the model generate content based on given inputs.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "kakaocorp/kanana-2-30b-a3b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="bfloat16",
    device_map="auto"
)

prompt = "Explain the future of AI."

messages = [
    {"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

output = model.generate(
    input_ids,
    max_new_tokens=128,
    do_sample=False,
)
print(tokenizer.decode(output[0]))

For deployment, you can use vllm or sglang to create an OpenAI-compatible API endpoint:

  • vLLM:
    vllm serve kakaocorp/kanana-2-30b-a3b-instruct [--enable-auto-tool-choice --tool-call-parser hermes]
    
  • SGLang:
    python3 -m sglang.launch_server --model-path kakaocorp/kanana-2-30b-a3b-instruct [--tool-call-parser qwen]
    

Processing 32K+ Length

Currently, the config.json uploaded to HuggingFace is configured for token lengths of 32,768 or less. To process tokens beyond this length, YaRN must be applied. By updating the config.json with the following parameters, you can apply YaRN to handle token sequences up to 128K in length:

"rope_scaling": {
    "beta_fast": 32,
    "beta_slow": 1,
    "factor": 4.0,
    "mscale": 1.0,
    "mscale_all_dim": 1.0,
    "original_max_position_embeddings": 32768,
    "type": "yarn",
},

Passing command line arguments for deployment:

  • vllm

    vllm serve ... --rope-scaling '{"rope_type":"deepseek_yarn","factor":4.0,"beta_fast":32,"beta_slow":1,"mscale":1.0,"mscale_all_dim":1.0,"original_max_position_embeddings":32768}' --max-model-len 131072
    
  • sglang

    python3 -m sglang.launch_server ... --json-model-override-args '{"max_position_embeddings":131072, "rope_scaling":{"rope_type":"deepseek_yarn","factor":4.0,"beta_fast":32,"beta_slow":1,"mscale":1.0,"mscale_all_dim":1.0,"original_max_position_embeddings":32768}}'
    

Most leading open-source implementations of static YaRN apply a constant scaling factor, which can negatively impact performance on shorter texts. To ensure optimal performance:

  • Enable rope_scaling only when necessary for processing long contexts.
  • Adjust the factor based on your specific needs (e.g., set factor to 2.0 for a 65,536-token context)."

License

The model weights are released under the Kanana License.


Citation

@article{,
  title={Kanana-2 LLM},
  author={Kanana LLM},
  year={2025},
  url={https://huggingface.co/collections/kakaocorp/kanana-2}
}

Contact

Downloads last month
630
Safetensors
Model size
31B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Model tree for kakaocorp/kanana-2-30b-a3b-instruct

Finetuned
(2)
this model

Collection including kakaocorp/kanana-2-30b-a3b-instruct