---
language:
- ja
- en
license: other
library_name: mlx
tags:
- mlx
- lfm2
- liquidai
base_model: LiquidAI/LFM2.5-1.2B-JP-202606
pipeline_tag: text-generation
---

# LFM2.5-1.2B-JP-202606-MLX-4bit

This is an [MLX](https://github.com/ml-explore/mlx) conversion of
[`LiquidAI/LFM2.5-1.2B-JP-202606`](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-202606) (~0.64 GB).

## Precision

Quantized to **4-bit** (group size 64). Two precision-sensitive tensors are protected, mirroring llama.cpp's policy:

- **Tied embeddings (= output head)** are kept at **6-bit**. `LiquidAI/LFM2.5-1.2B-JP-202606` ties `embed_tokens` with the LM head, so a uniform low-bit quant would degrade both the input lookup and the output logits.
- The MoE router gate is kept in fp32 — not applicable here, since this is a dense model with no experts.

## Use with mlx-lm

```bash
pip install mlx-lm
```

```python
from mlx_lm import load, generate

model, tokenizer = load("LiquidAI/LFM2.5-1.2B-JP-202606-MLX-4bit")
messages = [{"role": "user", "content": "日本の首都はどこですか？"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
print(generate(model, tokenizer, prompt, max_tokens=128, verbose=True))
```

## Conversion

Exported with [liquidmlx](https://github.com/Liquid4All/mlx-export). See the base
model card for license, training, and intended-use details.