Ministral 3 14B Instruct 2512

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.

This model is the instruct post-trained version in FP8, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.

The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 14B can even be deployed locally, capable of fitting in 24GB of VRAM in FP8, and less if further quantized.

Learn more in our blog post here.

Key Features

Ministral 3 14B consists of two main architectural components:

  • 13.5B Language Model
  • 0.4B Vision Encoder

The Ministral 3 14B Instruct model offers the following capabilities:

  • Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
  • Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
  • System Prompt: Maintains strong adherence and support for system prompts.
  • Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
  • Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
  • Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
  • Large Context Window: Supports a 256k context window.

Use Cases

Private AI deployments where advanced capabilities meet practical hardware constraints:

  • Private/custom chat and AI assistant deployments in constrained environments
  • Advanced local agentic use cases
  • Fine-tuning and specialization
  • And more...

Bringing advanced AI capabilities to most environments.

Recommended Settings

We recommend deploying with the following best practices:

  • System Prompt: Define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems.
  • Sampling Parameters: Use a temperature below 0.1 for daily-driver and production environments ; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings.
  • Tools: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools.
  • Vision: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance.

Ministral 3 Family

Model Name Type Precision Link
Ministral 3 3B Base 2512 Base pre-trained BF16 Hugging Face
Ministral 3 3B Instruct 2512 Instruct post-trained FP8 Hugging Face
Ministral 3 3B Reasoning 2512 Reasoning capable BF16 Hugging Face
Ministral 3 8B Base 2512 Base pre-trained BF16 Hugging Face
Ministral 3 8B Instruct 2512 Instruct post-trained FP8 Hugging Face
Ministral 3 8B Reasoning 2512 Reasoning capable BF16 Hugging Face
Ministral 3 14B Base 2512 Base pre-trained BF16 Hugging Face
Ministral 3 14B Instruct 2512 Instruct post-trained FP8 Hugging Face
Ministral 3 14B Reasoning 2512 Reasoning capable BF16 Hugging Face

Other formats available here.

Benchmark Results

We compare Ministral 3 to similar sized models.

Reasoning

Model AIME25 AIME24 GPQA Diamond LiveCodeBench
Ministral 3 14B 0.850 0.898 0.712 0.646
Qwen3-14B (Thinking) 0.737 0.837 0.663 0.593
Ministral 3 8B 0.787 0.860 0.668 0.616
Qwen3-VL-8B-Thinking 0.798 0.860 0.671 0.580
Ministral 3 3B 0.721 0.775 0.534 0.548
Qwen3-VL-4B-Thinking 0.697 0.729 0.601 0.513

Instruct

Model Arena Hard WildBench MATH Maj@1 MM MTBench
Ministral 3 14B 0.551 68.5 0.904 8.49
Qwen3 14B (Non-Thinking) 0.427 65.1 0.870 NOT MULTIMODAL
Gemma3-12B-Instruct 0.436 63.2 0.854 6.70
Ministral 3 8B 0.509 66.8 0.876 8.08
Qwen3-VL-8B-Instruct 0.528 66.3 0.946 8.00
Ministral 3 3B 0.305 56.8 0.830 7.83
Qwen3-VL-4B-Instruct 0.438 56.8 0.900 8.01
Qwen3-VL-2B-Instruct 0.163 42.2 0.786 6.36
Gemma3-4B-Instruct 0.318 49.1 0.759 5.23

Base

Model Multilingual MMLU MATH CoT 2-Shot AGIEval 5-shot MMLU Redux 5-shot MMLU 5-shot TriviaQA 5-shot
Ministral 3 14B 0.742 0.676 0.648 0.820 0.794 0.749
Qwen3 14B Base 0.754 0.620 0.661 0.837 0.804 0.703
Gemma 3 12B Base 0.690 0.487 0.587 0.766 0.745 0.788
Ministral 3 8B 0.706 0.626 0.591 0.793 0.761 0.681
Qwen 3 8B Base 0.700 0.576 0.596 0.794 0.760 0.639
Ministral 3 3B 0.652 0.601 0.511 0.735 0.707 0.592
Qwen 3 4B Base 0.677 0.405 0.570 0.759 0.713 0.530
Gemma 3 4B Base 0.516 0.294 0.430 0.626 0.589 0.640

Usage

The model can be used with the following frameworks;

vLLM

We recommend using this model with vLLM.

Installation

Make sure to install vllm >= 0.12.0:

pip install vllm --upgrade

Doing so should automatically install mistral_common >= 1.8.6.

To check:

python -c "import mistral_common; print(mistral_common.__version__)"

You can also make use of a ready-to-go docker image or on the docker hub.

Serve

Due to their size and the FP8 format of their weights Ministral-3-3B-Instruct-2512, Ministral-3-8B-Instruct-2512 and Ministral-3-14B-Instruct-2512 can run on a single 1xH200 GPU.

A simple launch command is:

vllm serve mistralai/Ministral-3-14B-Instruct-2512 \
  --tokenizer_mode mistral --config_format mistral --load_format mistral \
  --enable-auto-tool-choice --tool-call-parser mistral

Key parameter notes:

  • enable-auto-tool-choice: Required when enabling tool usage.
  • tool-call-parser mistral: Required when enabling tool usage.

Additional flags:

  • You can set --max-model-len to preserve memory. By default it is set to 262144 which is quite large but not necessary for most scenarios.
  • You can set --max-num-batched-tokens to balance throughput and latency, higher means higher throughput but higher latency.

Usage of the model

Here we assume that the model mistralai/Ministral-3-14B-Instruct-2512 is served and you can ping it to the domain localhost with the port 8000 which is the default for vLLM.

Vision Reasoning

Let's see if the Ministral 3 knows when to pick a fight !

from datetime import datetime, timedelta

from openai import OpenAI
from huggingface_hub import hf_hub_download

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 262144

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id


def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)


SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]


response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=TEMP,
    max_tokens=MAX_TOK,
)

print(response.choices[0].message.content)
Function Calling

Let's solve some equations thanks to our simple Python calculator tool.

import json
from openai import OpenAI
from huggingface_hub import hf_hub_download

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 262144

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id


def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    return system_prompt


SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

image_url = "https://math-coaching.com/img/fiche/46/expressions-mathematiques.jpg"


def my_calculator(expression: str) -> str:
    return str(eval(expression))


tools = [
    {
        "type": "function",
        "function": {
            "name": "my_calculator",
            "description": "A calculator that can evaluate a mathematical expression.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The mathematical expression to evaluate.",
                    },
                },
                "required": ["expression"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "rewrite",
            "description": "Rewrite a given text for improved clarity",
            "parameters": {
                "type": "object",
                "properties": {
                    "text": {
                        "type": "string",
                        "description": "The input text to rewrite",
                    }
                },
            },
        },
    },
]

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Thanks to your calculator, compute the results for the equations that involve numbers displayed in the image.",
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": image_url,
                },
            },
        ],
    },
]

response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=TEMP,
    max_tokens=MAX_TOK,
    tools=tools,
    tool_choice="auto",
)

tool_calls = response.choices[0].message.tool_calls

results = []
for tool_call in tool_calls:
    function_name = tool_call.function.name
    function_args = tool_call.function.arguments
    if function_name == "my_calculator":
        result = my_calculator(**json.loads(function_args))
        results.append(result)

messages.append({"role": "assistant", "tool_calls": tool_calls})
for tool_call, result in zip(tool_calls, results):
    messages.append(
        {
            "role": "tool",
            "tool_call_id": tool_call.id,
            "name": tool_call.function.name,
            "content": result,
        }
    )


response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=TEMP,
    max_tokens=MAX_TOK,
)

print(response.choices[0].message.content)
Text-Only Request

Ministral 3 can follow your instructions to the letter.

from openai import OpenAI
from huggingface_hub import hf_hub_download

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 262144

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id


def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    return system_prompt


SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": "Write me a sentence where every word starts with the next letter in the alphabet - start with 'a' and end with 'z'.",
    },
]

response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=TEMP,
    max_tokens=MAX_TOK,
)

assistant_message = response.choices[0].message.content
print(assistant_message)

Transformers

You can also use Ministral 3 14B Instruct 2512 with Transformers !

Transformers recently added support for FP8, so make sure to install from main:

uv pip install git+https://github.com/huggingface/transformers

To make the best use of our model with Transformers make sure to have installed mistral-common >= 1.8.6 to use our tokenizer.

pip install mistral-common --upgrade

Try it out by running the following snippet.

On latest main as of 05/12/2025, by default a FP8 triton kernel for fast accelerated matmuls (w8a8_block_fp8_matmul_triton) will be used without any degradation in accuracy. However, if you want to run your model in BF16 see (here)

Python snippet
import torch
from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend

model_id = "mistralai/Ministral-3-14B-Instruct-2512"

tokenizer = MistralCommonBackend.from_pretrained(model_id)
model = Mistral3ForConditionalGeneration.from_pretrained(model_id, device_map="auto")

image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]

tokenized = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True)

tokenized["input_ids"] = tokenized["input_ids"].to(device="cuda")
tokenized["pixel_values"] = tokenized["pixel_values"].to(dtype=torch.bfloat16, device="cuda")
image_sizes = [tokenized["pixel_values"].shape[-2:]]

output = model.generate(
    **tokenized,
    image_sizes=image_sizes,
    max_new_tokens=512,
)[0]

decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
print(decoded_output)

Transformers BF16

Transformers allows you to automatically convert the checkpoint to Bfloat16. To do so, simply load the model as follows:

from transformers import Mistral3ForConditionalGeneration, FineGrainedFP8Config

model_id = "mistralai/Ministral-3-14B-Instruct-2512"
model = Mistral3ForConditionalGeneration.from_pretrained(
    model_id,
    device_map="auto",
    quantization_config=FineGrainedFP8Config(dequantize=True)
)

License

This model is licensed under the Apache 2.0 License.

You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.

Downloads last month
33,286
Safetensors
Model size
14B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 4 Ask for provider support

Model tree for mistralai/Ministral-3-14B-Instruct-2512

Quantized
(16)
this model
Finetunes
3 models
Quantizations
11 models

Space using mistralai/Ministral-3-14B-Instruct-2512 1

Collection including mistralai/Ministral-3-14B-Instruct-2512