--- language: - ja - en license: other library_name: mlx tags: - mlx - lfm2 - liquidai base_model: LiquidAI/LFM2.5-1.2B-JP-202606 pipeline_tag: text-generation --- # LFM2.5-1.2B-JP-202606-MLX-4bit This is an [MLX](https://github.com/ml-explore/mlx) conversion of [`LiquidAI/LFM2.5-1.2B-JP-202606`](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP-202606) (~0.64 GB). ## Precision Quantized to **4-bit** (group size 64). Two precision-sensitive tensors are protected, mirroring llama.cpp's policy: - **Tied embeddings (= output head)** are kept at **6-bit**. `LiquidAI/LFM2.5-1.2B-JP-202606` ties `embed_tokens` with the LM head, so a uniform low-bit quant would degrade both the input lookup and the output logits. - The MoE router gate is kept in fp32 — not applicable here, since this is a dense model with no experts. ## Use with mlx-lm ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("LiquidAI/LFM2.5-1.2B-JP-202606-MLX-4bit") messages = [{"role": "user", "content": "日本の首都はどこですか?"}] prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True) print(generate(model, tokenizer, prompt, max_tokens=128, verbose=True)) ``` ## Conversion Exported with [liquidmlx](https://github.com/Liquid4All/mlx-export). See the base model card for license, training, and intended-use details.