Instructions to use LiquidAI/LFM2.5-1.2B-JP-202606 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LiquidAI/LFM2.5-1.2B-JP-202606 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LiquidAI/LFM2.5-1.2B-JP-202606")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2.5-1.2B-JP-202606")
model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-JP-202606")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use LiquidAI/LFM2.5-1.2B-JP-202606 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LiquidAI/LFM2.5-1.2B-JP-202606"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2.5-1.2B-JP-202606",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LiquidAI/LFM2.5-1.2B-JP-202606

SGLang

How to use LiquidAI/LFM2.5-1.2B-JP-202606 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LiquidAI/LFM2.5-1.2B-JP-202606" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2.5-1.2B-JP-202606",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LiquidAI/LFM2.5-1.2B-JP-202606" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2.5-1.2B-JP-202606",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LiquidAI/LFM2.5-1.2B-JP-202606 with Docker Model Runner:
```
docker model run hf.co/LiquidAI/LFM2.5-1.2B-JP-202606
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Try LFM • Docs • LEAP • Discord

🇯🇵 LFM2.5-1.2B-JP-202606

LFM2.5-1.2B-JP-202606 is our latest general purpose Japanese chat model, delivering significant improvements in knowledge, instruction following, math, code, and tool-use over both the models of comparable size and LFM2.5-1.2B-JP. It sets a new benchmark for state-of-the-art performance in Japanese language understanding. Ideal for developers building Japanese-language applications where cultural and linguistic nuance matter.

LFM2.5-1.2B-JP-202606 は、当社の最新の汎用日本語チャットモデルです。知識、指示追従、数学、コード、ツール使用の各領域において、同規模の他モデルおよび LFM2.5-1.2B-JP の双方を大幅に上回る改善を実現しています。日本語全般における最高水準のベンチマーク性能を発揮します。文化的・言語的なニュアンスが重要となる日本語アプリケーションを構築する開発者に最適です。

Find more information about LFM2.5 in our blog post.

📊 Performance

We compared LFM2.5-1.2B-JP-202606 with relevant sub-2B models on a diverse suite of benchmarks.

Model	Size	Knowledge					Instruction Following			Math			Code	Tool Use	Domain Avg
Model	Size	JMMLU‑ProX	JMMLU	JCulture	JGPQA	Avg	J‑MIFEval	JFBench¹	Avg	J‑GSM8K	J‑MATH500	Avg	JHumanEval+	J‑BFCLv3²	Domain Avg
LFM2.5‑1.2B‑JP‑202606	1.2B	36.23	54.19	35.77	28.69	38.72	79.08	54.77	66.93	62.20	62.80	62.50	49.39	48.00	53.11
LFM2.5‑1.2B‑Instruct	1.2B	31.42	47.61	28.42	31.72	34.79	40.44	36.67	38.56	50.20	50.00	50.10	28.66	46.29	39.68
Qwen3‑1.7B (Instruct)	1.7B	30.78	47.67	33.33	26.26	34.51	40.29	36.61	38.45	46.00	56.40	51.20	47.56	52.45	44.83
Granite‑4.0‑1B	1.5B	15.32	33.93	34.38	24.44	27.02	27.56	31.26	29.41	42.80	25.40	34.10	51.22	50.57	38.46
Llama‑3.2‑1B‑Instruct	1.2B	15.91	33.97	22.52	32.32	26.18	24.10	21.78	22.94	25.20	11.40	18.30	17.68	21.06	21.23
Gemma‑3‑1B‑it	1.0B	14.12	34.45	23.42	24.24	24.06	26.31	31.15	28.73	33.60	15.60	24.60	25.00	17.26	23.93
sarashina2.2‑1b‑instruct‑v0.1	1.4B	18.3	40.24	25.53	26.26	27.58	21.9	27.41	24.66	44.4	24.8	34.60	21.95	13.86	24.53
TinySwallow‑1.5B‑Instruct	1.5B	21.51	47.98	31.17	29.29	32.49	36.55	34.25	35.40	47.2	22.4	34.80	26.83	11.7	28.24
llm‑jp‑3.1‑1.8b‑instruct4	1.9B	17.44	43.05	27.42	17.68	26.40	33.77	30.92	32.35	52.8	17.0	34.90	35.37	11.76	28.16
RakutenAI‑2.0‑mini‑instruct	1.5B	11.46	31.84	29.67	22.22	23.80	28.06	24.66	26.36	24.8	11.4	18.10	28.6	11.85	21.74

¹ JFBench is evaluated using single-instruction prompts.
² quickTestingOSSHandler is used for models that do not support function calling (sarashina2.2‑1b‑instruct‑v0.1, TinySwallow‑1.5B‑Instruct, llm‑jp‑3.1‑1.8b‑instruct4, and RakutenAI‑2.0‑mini‑instruct).

🗒️ Model Details

Model	Parameters	Description
LFM2.5-1.2B-Base	1.2B	Pre-trained base model for fine-tuning
LFM2.5-1.2B-Instruct	1.2B	General-purpose instruction-tuned model
LFM2.5-1.2B-Thinking	1.2B	General-purpose reasoning model
LFM2.5-1.2B-JP-202606	1.2B	Japanese-capable chat model
LFM2.5-VL-1.6B	1.6B	Vision-language model with fast inference
LFM2.5-Audio-1.5B	1.5B	Audio-language model for speech and text I/O
LFM2.5-Audio-1.5B-JP	1.5B	Japanese-capable audio model for speech and text I/O

LFM2.5-1.2B-JP-202606 is a general-purpose text-only model with the following features:

Number of parameters: 1.17B
Number of layers: 16 (10 double-gated LIV convolution blocks + 6 GQA blocks)
Training budget: 31.5T tokens
Context length: 32,768 tokens
Vocabulary size: 65,536
Knowledge cutoff: Mid-2024
Languages: English, Japanese
Generation parameters:
- temperature: 0.1
- top_k: 50
- repetition_penalty: 1.05

Model	Description
LFM2.5-1.2B-JP-202606	Original model checkpoint in native format. Best for fine-tuning or inference with Transformers and vLLM.
LFM2.5-1.2B-JP-202606-GGUF	Quantized format for llama.cpp and compatible tools. Optimized for CPU inference and local deployment with reduced memory usage.
LFM2.5-1.2B-JP-202606-ONNX	ONNX Runtime format for cross-platform deployment. Enables hardware-accelerated inference across diverse environments (cloud, edge, mobile).
LFM2.5-1.2B-JP-202606-MLX	MLX format for Apple Silicon. Optimized for fast inference on Mac devices using the MLX framework.

We recommend using it for agentic workflows, tool use, structured outputs, bilingual English–Japanese assistants, and on-device personal-assistant applications. It is not recommended for knowledge-intensive tasks. It performs best when given clear, explicit instructions that define the task, expected behavior, and output format.

エージェント型ワークフロー、ツール使用、構造化出力、日英バイリンガルアシスタント、オンデバイスのパーソナルアシスタントでの利用を推奨します。一方で、詳細な知識を要するのタスクには推奨されません。タスク内容、期待される動作、出力形式を明確かつ具体的に指示することで、最も高い性能を発揮します。

Chat Template

LFM2.5 uses a ChatML-like format. See the Chat Template documentation for details. Example:

<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
日本の首都は？<|im_end|>
<|im_start|>assistant

You can use tokenizer.apply_chat_template() to format your messages automatically.

Tool Use

LFM2.5 supports function calling as follows:

Function definition: We recommend providing the list of tools as a JSON object in the system prompt. You can also use the tokenizer.apply_chat_template() function with tools.
Function call: By default, LFM2.5 writes Pythonic function calls (a Python list between <|tool_call_start|> and <|tool_call_end|> special tokens), as the assistant answer. You can override this behavior by asking the model to output JSON function calls in the system prompt.
Function execution: The function call is executed, and the result is returned as a "tool" role.
Final answer: LFM2 interprets the outcome of the function call to address the original user prompt in plain text.

See the Tool Use documentation for the full guide. Example:

<|startoftext|><|im_start|>system
List of tools: [{"name": "get_candidate_status", "description": "採用プロセスにおける候補者の現在のステータスを取得します", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "候補者の一意の識別子"}}, "required": ["candidate_id"]}}]<|im_end|>
<|im_start|>user
候補者ID 12345 の現在のステータスは何ですか？<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>候補者ID 12345 の現在のステータスを確認しています。<|im_end|>
<|im_start|>tool
[{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}]<|im_end|>
<|im_start|>assistant
ID 12345 の候補者は現在、Clinical Research Associate のポジションで「面接予定」の段階にあり、面接日は 2023年11月20日に設定されています。<|im_end|>

🏃 Inference

LFM2.5 is supported by many inference frameworks. See the Inference documentation for the full list.

Name	Description	Docs	Notebook
Transformers	Simple inference with direct access to model internals.	Link
vLLM	High-throughput production deployments with GPU.	Link
llama.cpp	Cross-platform inference with CPU offloading.	Link
MLX	Apple's machine learning framework optimized for Apple Silicon.	Link	—
LM Studio	Desktop application for running LLMs locally.	Link	—

Here's a quick start example with Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id = "LFM2.5-1.2B-JP-202606"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
#   attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = "日本の首都は？"
input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)
output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.1,
    top_k=50,
    repetition_penalty=1.05,
    max_new_tokens=512,
    streamer=streamer,
)

🔧 Fine-Tuning

We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.

Name	Description	Docs
CPT (Unsloth)	Continued Pre-Training using Unsloth for text completion.	Link
CPT (Unsloth)	Continued Pre-Training using Unsloth for translation.	Link
SFT (Unsloth)	Supervised Fine-Tuning with LoRA using Unsloth.	Link
SFT (TRL)	Supervised Fine-Tuning with LoRA using TRL.	Link
DPO (TRL)	Direct Preference Optimization with LoRA using TRL.	Link
GRPO (Unsloth)	GRPO with LoRA using Unsloth.	Link
GRPO (TRL)	GRPO with LoRA using TRL.	Link

📬 Contact

Got questions or want to connect? Join our Discord community
If you are interested in custom solutions with edge deployment, please contact our sales team.

Citation

@article{liquidai2025lfm2,
  title={LFM2 Technical Report},
  author={Liquid AI},
  journal={arXiv preprint arXiv:2511.23404},
  year={2025}
}

Downloads last month: 189

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for LiquidAI/LFM2.5-1.2B-JP-202606

Base model

LiquidAI/LFM2.5-1.2B-Base

Finetuned

(35)

this model

Quantizations

6 models

Collections including LiquidAI/LFM2.5-1.2B-JP-202606

💧 LFM2.5

Collection

Collection of post-trained and base LFM2.5 models. • 35 items • Updated 1 day ago • 147

🇯🇵 LFM JP

Collection

Japanese-tuned LFMs. • 8 items • Updated about 14 hours ago • 1

Paper for LiquidAI/LFM2.5-1.2B-JP-202606

LFM2 Technical Report

Paper • 2511.23404 • Published Nov 28, 2025 • 61