Instructions to use LiquidAI/LFM2.5-1.2B-JP-202606 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LiquidAI/LFM2.5-1.2B-JP-202606 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LiquidAI/LFM2.5-1.2B-JP-202606") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2.5-1.2B-JP-202606") model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-1.2B-JP-202606") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LiquidAI/LFM2.5-1.2B-JP-202606 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LiquidAI/LFM2.5-1.2B-JP-202606" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-1.2B-JP-202606", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/LiquidAI/LFM2.5-1.2B-JP-202606
- SGLang
How to use LiquidAI/LFM2.5-1.2B-JP-202606 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-1.2B-JP-202606" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-1.2B-JP-202606", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-1.2B-JP-202606" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-1.2B-JP-202606", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use LiquidAI/LFM2.5-1.2B-JP-202606 with Docker Model Runner:
docker model run hf.co/LiquidAI/LFM2.5-1.2B-JP-202606
🇯🇵 LFM2.5-1.2B-JP-202606
LFM2.5-1.2B-JP-202606 is our latest general purpose Japanese chat model, delivering significant improvements in knowledge, instruction following, math, code, and tool-use over both the models of comparable size and LFM2.5-1.2B-JP. It sets a new benchmark for state-of-the-art performance in Japanese language understanding. Ideal for developers building Japanese-language applications where cultural and linguistic nuance matter.
LFM2.5-1.2B-JP-202606 は、当社の最新の汎用日本語チャットモデルです。知識、指示追従、数学、コード、ツール使用の各領域において、同規模の他モデルおよび LFM2.5-1.2B-JP の双方を大幅に上回る改善を実現しています。日本語全般における最高水準のベンチマーク性能を発揮します。 文化的・言語的なニュアンスが重要となる日本語アプリケーションを構築する開発者に最適です。
Find more information about LFM2.5 in our blog post.
📊 Performance
We compared LFM2.5-1.2B-JP-202606 with relevant sub-2B models on a diverse suite of benchmarks.
| Model | Size | Knowledge | Instruction Following | Math | Code | Tool Use | Domain Avg | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| JMMLU‑ProX | JMMLU | JCulture | JGPQA | Avg | J‑MIFEval | JFBench1 | Avg | J‑GSM8K | J‑MATH500 | Avg | JHumanEval+ | J‑BFCLv32 | |||
| LFM2.5‑1.2B‑JP‑202606 | 1.2B | 36.23 | 54.19 | 35.77 | 28.69 | 38.72 | 79.08 | 54.77 | 66.93 | 62.20 | 62.80 | 62.50 | 49.39 | 48.00 | 53.11 |
| LFM2.5‑1.2B‑Instruct | 1.2B | 31.42 | 47.61 | 28.42 | 31.72 | 34.79 | 40.44 | 36.67 | 38.56 | 50.20 | 50.00 | 50.10 | 28.66 | 46.29 | 39.68 |
| Qwen3‑1.7B (Instruct) | 1.7B | 30.78 | 47.67 | 33.33 | 26.26 | 34.51 | 40.29 | 36.61 | 38.45 | 46.00 | 56.40 | 51.20 | 47.56 | 52.45 | 44.83 |
| Granite‑4.0‑1B | 1.5B | 15.32 | 33.93 | 34.38 | 24.44 | 27.02 | 27.56 | 31.26 | 29.41 | 42.80 | 25.40 | 34.10 | 51.22 | 50.57 | 38.46 |
| Llama‑3.2‑1B‑Instruct | 1.2B | 15.91 | 33.97 | 22.52 | 32.32 | 26.18 | 24.10 | 21.78 | 22.94 | 25.20 | 11.40 | 18.30 | 17.68 | 21.06 | 21.23 |
| Gemma‑3‑1B‑it | 1.0B | 14.12 | 34.45 | 23.42 | 24.24 | 24.06 | 26.31 | 31.15 | 28.73 | 33.60 | 15.60 | 24.60 | 25.00 | 17.26 | 23.93 |
| sarashina2.2‑1b‑instruct‑v0.1 | 1.4B | 18.3 | 40.24 | 25.53 | 26.26 | 27.58 | 21.9 | 27.41 | 24.66 | 44.4 | 24.8 | 34.60 | 21.95 | 13.86 | 24.53 |
| TinySwallow‑1.5B‑Instruct | 1.5B | 21.51 | 47.98 | 31.17 | 29.29 | 32.49 | 36.55 | 34.25 | 35.40 | 47.2 | 22.4 | 34.80 | 26.83 | 11.7 | 28.24 |
| llm‑jp‑3.1‑1.8b‑instruct4 | 1.9B | 17.44 | 43.05 | 27.42 | 17.68 | 26.40 | 33.77 | 30.92 | 32.35 | 52.8 | 17.0 | 34.90 | 35.37 | 11.76 | 28.16 |
| RakutenAI‑2.0‑mini‑instruct | 1.5B | 11.46 | 31.84 | 29.67 | 22.22 | 23.80 | 28.06 | 24.66 | 26.36 | 24.8 | 11.4 | 18.10 | 28.6 | 11.85 | 21.74 |
1 JFBench is evaluated using single-instruction prompts.
2 quickTestingOSSHandler is used for models that do not support function calling (sarashina2.2‑1b‑instruct‑v0.1, TinySwallow‑1.5B‑Instruct, llm‑jp‑3.1‑1.8b‑instruct4, and RakutenAI‑2.0‑mini‑instruct).
🗒️ Model Details
| Model | Parameters | Description |
|---|---|---|
| LFM2.5-1.2B-Base | 1.2B | Pre-trained base model for fine-tuning |
| LFM2.5-1.2B-Instruct | 1.2B | General-purpose instruction-tuned model |
| LFM2.5-1.2B-Thinking | 1.2B | General-purpose reasoning model |
| LFM2.5-1.2B-JP-202606 | 1.2B | Japanese-capable chat model |
| LFM2.5-VL-1.6B | 1.6B | Vision-language model with fast inference |
| LFM2.5-Audio-1.5B | 1.5B | Audio-language model for speech and text I/O |
| LFM2.5-Audio-1.5B-JP | 1.5B | Japanese-capable audio model for speech and text I/O |
LFM2.5-1.2B-JP-202606 is a general-purpose text-only model with the following features:
- Number of parameters: 1.17B
- Number of layers: 16 (10 double-gated LIV convolution blocks + 6 GQA blocks)
- Training budget: 31.5T tokens
- Context length: 32,768 tokens
- Vocabulary size: 65,536
- Knowledge cutoff: Mid-2024
- Languages: English, Japanese
- Generation parameters:
temperature: 0.1top_k: 50repetition_penalty: 1.05
| Model | Description |
|---|---|
| LFM2.5-1.2B-JP-202606 | Original model checkpoint in native format. Best for fine-tuning or inference with Transformers and vLLM. |
| LFM2.5-1.2B-JP-202606-GGUF | Quantized format for llama.cpp and compatible tools. Optimized for CPU inference and local deployment with reduced memory usage. |
| LFM2.5-1.2B-JP-202606-ONNX | ONNX Runtime format for cross-platform deployment. Enables hardware-accelerated inference across diverse environments (cloud, edge, mobile). |
| LFM2.5-1.2B-JP-202606-MLX | MLX format for Apple Silicon. Optimized for fast inference on Mac devices using the MLX framework. |
We recommend using it for agentic workflows, tool use, structured outputs, bilingual English–Japanese assistants, and on-device personal-assistant applications. It is not recommended for knowledge-intensive tasks. It performs best when given clear, explicit instructions that define the task, expected behavior, and output format.
エージェント型ワークフロー、ツール使用、構造化出力、日英バイリンガルアシスタント、オンデバイスのパーソナルアシスタントでの利用を推奨します。一方で、詳細な知識を要するのタスクには推奨されません。タスク内容、期待される動作、出力形式を明確かつ具体的に指示することで、最も高い性能を発揮します。
Chat Template
LFM2.5 uses a ChatML-like format. See the Chat Template documentation for details. Example:
<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
日本の首都は?<|im_end|>
<|im_start|>assistant
You can use tokenizer.apply_chat_template() to format your messages automatically.
Tool Use
LFM2.5 supports function calling as follows:
- Function definition: We recommend providing the list of tools as a JSON object in the system prompt. You can also use the
tokenizer.apply_chat_template()function with tools. - Function call: By default, LFM2.5 writes Pythonic function calls (a Python list between
<|tool_call_start|>and<|tool_call_end|>special tokens), as the assistant answer. You can override this behavior by asking the model to output JSON function calls in the system prompt. - Function execution: The function call is executed, and the result is returned as a "tool" role.
- Final answer: LFM2 interprets the outcome of the function call to address the original user prompt in plain text.
See the Tool Use documentation for the full guide. Example:
<|startoftext|><|im_start|>system
List of tools: [{"name": "get_candidate_status", "description": "採用プロセスにおける候補者の現在のステータスを取得します", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "候補者の一意の識別子"}}, "required": ["candidate_id"]}}]<|im_end|>
<|im_start|>user
候補者ID 12345 の現在のステータスは何ですか?<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>候補者ID 12345 の現在のステータスを確認しています。<|im_end|>
<|im_start|>tool
[{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}]<|im_end|>
<|im_start|>assistant
ID 12345 の候補者は現在、Clinical Research Associate のポジションで「面接予定」の段階にあり、面接日は 2023年11月20日に設定されています。<|im_end|>
🏃 Inference
LFM2.5 is supported by many inference frameworks. See the Inference documentation for the full list.
| Name | Description | Docs | Notebook |
|---|---|---|---|
| Transformers | Simple inference with direct access to model internals. | Link | ![]() |
| vLLM | High-throughput production deployments with GPU. | Link | ![]() |
| llama.cpp | Cross-platform inference with CPU offloading. | Link | ![]() |
| MLX | Apple's machine learning framework optimized for Apple Silicon. | Link | — |
| LM Studio | Desktop application for running LLMs locally. | Link | — |
Here's a quick start example with Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id = "LFM2.5-1.2B-JP-202606"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
dtype="bfloat16",
# attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = "日本の首都は?"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
).to(model.device)
output = model.generate(
input_ids,
do_sample=True,
temperature=0.1,
top_k=50,
repetition_penalty=1.05,
max_new_tokens=512,
streamer=streamer,
)
🔧 Fine-Tuning
We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.
| Name | Description | Docs | Notebook |
|---|---|---|---|
| CPT (Unsloth) | Continued Pre-Training using Unsloth for text completion. | Link | ![]() |
| CPT (Unsloth) | Continued Pre-Training using Unsloth for translation. | Link | ![]() |
| SFT (Unsloth) | Supervised Fine-Tuning with LoRA using Unsloth. | Link | ![]() |
| SFT (TRL) | Supervised Fine-Tuning with LoRA using TRL. | Link | ![]() |
| DPO (TRL) | Direct Preference Optimization with LoRA using TRL. | Link | ![]() |
| GRPO (Unsloth) | GRPO with LoRA using Unsloth. | Link | ![]() |
| GRPO (TRL) | GRPO with LoRA using TRL. | Link | ![]() |
📬 Contact
- Got questions or want to connect? Join our Discord community
- If you are interested in custom solutions with edge deployment, please contact our sales team.
Citation
@article{liquidai2025lfm2,
title={LFM2 Technical Report},
author={Liquid AI},
journal={arXiv preprint arXiv:2511.23404},
year={2025}
}
- Downloads last month
- 189
