Model Card for shisa-ai/shisa-v2.1c-lfm2-350m

SOTA Japanese Shaberi Benchmarks @ <0.5B and <1B!

This was made just for fun on a Saturday for the Liquid AI Hackathon, but this is an early preview of our upcoming V2.1 models...

For full code and related evals see:

Presentation here:

Model Average ELYZA 100 JA-MT Rakuda Tengu
google/gemma-3-4b-it 6.44 7.34 6.78 5.68 5.97
045-llama3.2-1b-v2new-dpo405b 5.40 5.44 5.22 6.35 4.61
037-rakuten-2.0-mini-instruct-1.5b-v2new-dpo405b 5.10 5.42 4.60 5.68 4.70
augmxnt/shisa-gamma-7b-v1 4.80 5.86 4.07 4.55 4.72
shisa-ai/shisa-v2.1c-lfm2-350m 4.51 4.30 4.75 5.03 3.95
meta-llama/Llama-3.2-3B-Instruct 4.49 5.62 4.50 3.43 4.43
Qwen/Qwen3-0.6B 4.14 5.16 4.00 3.18 4.23
augmxnt/shisa-7b-v1 3.95 4.36 3.75 3.88 3.83
shisa-ai/shisa-v2.1c-lfm2-350m-sft3-tlonly 3.87 3.78 3.70 4.50 3.51
LiquidAI/LFM2-350M 3.76 3.92 4.07 3.55 3.51
meta-llama/Llama-3.2-1B-Instruct 2.97 3.82 2.82 2.45 2.79
google/gemma3-270m-it 2.53 3.42 2.33 2.10 2.28
LiquidAI/LFM2-350M-ENJP-MT 1.69 2.98 1.37 1.00 1.42
tiiuae/Falcon-H1-0.5B-Instruct 1.30 2.32 1.47 1.00 0.41

Visualize in Weights & Biases

Framework versions

  • TRL: 0.23.0
  • Transformers: 4.56.1
  • Pytorch: 2.10.0.dev20251008+cu130
  • Datasets: 4.2.0
  • Tokenizers: 0.22.1

Compute

This model was trained on an 8xMI300X node on the AMD Developer Cloud with compute generously sponsored by AMD.

Downloads last month
87
Safetensors
Model size
0.4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for shisa-ai/shisa-v2.1c-lfm2-350m

Base model

LiquidAI/LFM2-350M
Finetuned
(2)
this model
Quantizations
2 models