---
library_name: rkllm
pipeline_tag: text-generation
license: apache-2.0
language:
- en
base_model:
- Qwen/Qwen2.5-Math-7B-Instruct
tags:
  - rkllm
  - rk3588
  - rockchip
  - edge-ai
  - llm
  - math
  - chat
---
Qwen2.5-Math-7B-Instruct — RKLLM build for RK3588 boards

**Author:** @jamescallander  
**Source model:** [Qwen/Qwen2.5-Math-7B-Instruct · Hugging Face](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct)

> This repository hosts a **conversion** of `Qwen2-Math-7B-Instruct` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com)

#### Conversion details

- RKLLM-Toolkit version: v1.2.1
  
- NPU driver: v0.9.8
  
- Python: 3.12
  
- Quantization: `w8a8_g128`
  
- Output: single-file `.rkllm` artifact
  
- Tokenizer: not required at runtime (UI handles prompt I/O)
  

## ⚠️ Math reasoning disclaimer

🛑 **This model may make calculation or reasoning errors.**

- It is intended for **educational and experimental purposes only**.
  
- Always **double-check results** with trusted methods, calculators, or domain experts.
  
- Outputs should not be used as the sole basis for academic, financial, or scientific decisions.
  
- Use responsibly and verify correctness before relying on results.
  

## Intended use

- On-device math reasoning and step-by-step problem solving.
  
- Qwen2.5-Math-7B-Instruct is tuned for **mathematics and quantitative reasoning tasks** (problem solving, proofs, step-by-step derivations).
  

## Limitations

- Requires 8GB free memory
  
- Quantized build (`w8a8_g128`) may show small quality differences vs. full-precision upstream.
  
- Tested on Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
  
- Generated code should always be reviewed before use in production systems.
  

## Quick start (RK3588)

### 1) Install runtime

The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip).

Download and install the required packages as per the toolkit's instructions.

### 2) Simple Flask server deployment

The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo`

```bash
python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
  --rkllm_model_path <MODEL_PATH>/Qwen2.5-Math-7B-Instruct_w8a8_g128_rk3588.rkllm \
  --target_platform rk3588
```

### 3) Sending a request

A basic format for message request is:

```json
{
    "model":"Qwen2.5-Math-7B-Instruct",
    "messages":[{
        "role":"user",
        "content":"<YOUR_PROMPT_HERE>"}],
    "stream":false
}
```

Example request using `curl`:

```bash
curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
    -H 'Content-Type: application/json' \
    -d '{"model":"Qwen2.5-Math-7B-Instruct","messages":[{"role":"user","content":"How is sample standard deviation calculated?"}],"stream":false}'
```

The response is formated in the following way:

```json
{
    "choices":[{
        "finish_reason":"stop",
        "index":0,
        "logprobs":null,
        "message":{
            "content":"<MODEL_REPLY_HERE">,
            "role":"assistant"}}],
        "created":null,
        "id":"rkllm_chat",
        "object":"rkllm_chat",
        "usage":{
            "completion_tokens":null,
            "prompt_tokens":null,
            "total_tokens":null}
}
```

Example response:

```json
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"To calculate the sample standard deviation, follow these steps: 1. **Calculate the mean (average) of the sample:** \[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \] where \( x_i \) are the individual data points and \( n \) is the number of data points. 2. **Calculate the squared differences from the mean for each data point:** \[ (x_i - \bar{x})^2 \] 3. **Sum the squared differences:** \[ \sum_{i=1}^{n} (x_i - \bar{x})^2 \] 4. **Divide the sum of the squared differences by \( n-1 \) (this is called the Bessel's correction):** \[ s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1} \] where \( s^2 \) is the sample variance. 5. **Take the square root of the sample variance to get the sample standard deviation:** \[ s = \sqrt{s^2} = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}} \] So, the formula for the sample standard deviation is: \[ \boxed{s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}}} \]","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
```

### 4) UI compatibility

This server exposes an **OpenAI-compatible Chat Completions API**.

You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com))

- Configure your client with the API base: `http://<SERVER_IP_ADDRESS>:8080` and use the endpoint: `/rkllm_chat`
- Make sure the `model` field matches the converted model’s name, for example:

```json
{
 "model": "Qwen2.5-Math-7B-Instruct",
 "messages": [{"role":"user","content":"Hello!"}],
 "stream": false
}
```

# License

This conversion follows the license of the source model: [LICENSE · Qwen/Qwen2.5-Math-7B-Instruct at main](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct/blob/main/LICENSE)