Qwen2.5-Math-7B-Instruct — RKLLM build for RK3588 boards

Author: @jamescallander
Source model: Qwen/Qwen2.5-Math-7B-Instruct · Hugging Face

This repository hosts a conversion of Qwen2-Math-7B-Instruct for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the RKNN-LLM toolkit

Conversion details

RKLLM-Toolkit version: v1.2.1
NPU driver: v0.9.8
Python: 3.12
Quantization: w8a8_g128
Output: single-file .rkllm artifact
Tokenizer: not required at runtime (UI handles prompt I/O)

⚠️ Math reasoning disclaimer

🛑 This model may make calculation or reasoning errors.

It is intended for educational and experimental purposes only.
Always double-check results with trusted methods, calculators, or domain experts.
Outputs should not be used as the sole basis for academic, financial, or scientific decisions.
Use responsibly and verify correctness before relying on results.

Intended use

On-device math reasoning and step-by-step problem solving.
Qwen2.5-Math-7B-Instruct is tuned for mathematics and quantitative reasoning tasks (problem solving, proofs, step-by-step derivations).

Limitations

Requires 8GB free memory
Quantized build (w8a8_g128) may show small quality differences vs. full-precision upstream.
Tested on Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
Generated code should always be reviewed before use in production systems.

Quick start (RK3588)

1) Install runtime

The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from airockchip's github page.

Download and install the required packages as per the toolkit's instructions.

2) Simple Flask server deployment

The simplest way the deploy the .rkllm converted model is using an example script provided in the toolkit in this directory: rknn-llm/examples/rkllm_server_demo

python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
  --rkllm_model_path <MODEL_PATH>/Qwen2.5-Math-7B-Instruct_w8a8_g128_rk3588.rkllm \
  --target_platform rk3588

3) Sending a request

A basic format for message request is:

{
    "model":"Qwen2.5-Math-7B-Instruct",
    "messages":[{
        "role":"user",
        "content":"<YOUR_PROMPT_HERE>"}],
    "stream":false
}

Example request using curl:

curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
    -H 'Content-Type: application/json' \
    -d '{"model":"Qwen2.5-Math-7B-Instruct","messages":[{"role":"user","content":"How is sample standard deviation calculated?"}],"stream":false}'

The response is formated in the following way:

{
    "choices":[{
        "finish_reason":"stop",
        "index":0,
        "logprobs":null,
        "message":{
            "content":"<MODEL_REPLY_HERE">,
            "role":"assistant"}}],
        "created":null,
        "id":"rkllm_chat",
        "object":"rkllm_chat",
        "usage":{
            "completion_tokens":null,
            "prompt_tokens":null,
            "total_tokens":null}
}

Example response:

{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"To calculate the sample standard deviation, follow these steps: 1. **Calculate the mean (average) of the sample:** \[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \] where \( x_i \) are the individual data points and \( n \) is the number of data points. 2. **Calculate the squared differences from the mean for each data point:** \[ (x_i - \bar{x})^2 \] 3. **Sum the squared differences:** \[ \sum_{i=1}^{n} (x_i - \bar{x})^2 \] 4. **Divide the sum of the squared differences by \( n-1 \) (this is called the Bessel's correction):** \[ s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1} \] where \( s^2 \) is the sample variance. 5. **Take the square root of the sample variance to get the sample standard deviation:** \[ s = \sqrt{s^2} = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}} \] So, the formula for the sample standard deviation is: \[ \boxed{s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}}} \]","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}

4) UI compatibility

This server exposes an OpenAI-compatible Chat Completions API.

You can connect it to any OpenAI-compatible client or UI (for example: Open WebUI)

Configure your client with the API base: http://<SERVER_IP_ADDRESS>:8080 and use the endpoint: /rkllm_chat
Make sure the model field matches the converted model’s name, for example:

{
 "model": "Qwen2.5-Math-7B-Instruct",
 "messages": [{"role":"user","content":"Hello!"}],
 "stream": false
}

License

This conversion follows the license of the source model: LICENSE · Qwen/Qwen2.5-Math-7B-Instruct at main

Downloads last month: 6

Model tree for jamescallander/Qwen2.5-Math-7B-Instruct_w8a8_g128_rk3588.rkllm

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Math-7B

Finetuned

Qwen/Qwen2.5-Math-7B-Instruct

Finetuned

(118)

this model

Collections including jamescallander/Qwen2.5-Math-7B-Instruct_w8a8_g128_rk3588.rkllm

RK3588 rkllm Models

Collection

Converted models for use on RK3588 single board computers such as Radxa Rock 5b+, Orange Pi 5 plus, Banana Pi M7, etc. • 25 items • Updated Oct 4 • 1

RK3588 Math and Science Models

Collection

A collection of math and science models converted for use on RK3588 single board computers. • 7 items • Updated Oct 4