jamescallander commited on
Commit
5419c93
·
verified ·
1 Parent(s): fcbb68e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +138 -3
README.md CHANGED
@@ -1,3 +1,138 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: rkllm
3
+ pipeline_tag: text-generation
4
+ license: apache-2.0
5
+ language:
6
+ - en
7
+ base_model:
8
+ - openbmb/MiniCPM3-4B
9
+ tags:
10
+ - rkllm
11
+ - rk3588
12
+ - rockchip
13
+ - edge-ai
14
+ - llm
15
+ - MiniCPM3
16
+ - text-generation-inference
17
+ ---
18
+ # MiniCPM3-4B — RKLLM build for RK3588 boards
19
+
20
+ **Author:** @jamescallander
21
+ **Source model:** [openbmb/MiniCPM4-0.5B · Hugging Face](https://huggingface.co/openbmb/MiniCPM4-0.5B)
22
+
23
+ **Target:** Rockchip RK3588 NPU via **RKNN-LLM Runtime**
24
+
25
+ > This repository hosts a **conversion** of `MiniCPM3-4B` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com)
26
+
27
+ #### Conversion details
28
+
29
+ - **RKLLM-Toolkit version:** v1.2.1
30
+ - **NPU driver:** v0.9.8
31
+ - **Python:** 3.12
32
+ - **Quantization:** `w8a8_g128`
33
+ - **Output:** single-file `.rkllm` artifact
34
+ - **Modifications:** quantization (w8a8_g128), export to `.rkllm` format for RK3588 SBCs
35
+ - ****Tokenizer:**** not required at runtime (UI handles prompt I/O)
36
+
37
+ ## Intended use
38
+
39
+ - On-device lightweight inference on RK3588 SBCs.
40
+ - MiniCPM3-4B is a **compact general-purpose model** designed for efficiency, testing, and resource-constrained scenarios. Ideal for experimentation where low memory usage and fast response matter more than deep reasoning.
41
+
42
+ ## Limitations
43
+
44
+ - Requires 8GB free memory
45
+ - Tested on a Radxa Rock 5B+, other devices may require different drivers/toolkit versions.
46
+ - Quantization (`w8a8_g128`) may further reduce output fidelity.
47
+
48
+ ## Quick start (RK3588)
49
+
50
+ ### 1) Install runtime
51
+
52
+ The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip).
53
+
54
+ Download and install the required packages as per the toolkit's instructions.
55
+
56
+ ### 2) Simple Flask server deployment
57
+
58
+ The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo`
59
+
60
+ ```bash
61
+ python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
62
+ --rkllm_model_path <MODEL_PATH>/MiniCPM3-4B_w8a8_g128_rk3588.rkllm \
63
+ --target_platform rk3588
64
+ ```
65
+
66
+ ### 3) Sending a request
67
+
68
+ A basic format for message request is:
69
+
70
+ ```json
71
+ {
72
+ "model":"MiniCPM3-4B",
73
+ "messages":[{
74
+ "role":"user",
75
+ "content":"<YOUR_PROMPT_HERE>"}],
76
+ "stream":false
77
+ }
78
+ ```
79
+
80
+ Example request using `curl`:
81
+
82
+ ```bash
83
+ curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
84
+ -H 'Content-Type: application/json' \
85
+ -d '{"model":"MiniCPM3-4B","messages":[{"role":"user","content":"In 2 or 3 sentences, who was Napoleon Bonaparte?"}],"stream":false}'
86
+ ```
87
+
88
+ The response is formated in the following way:
89
+
90
+ ```json
91
+ {
92
+ "choices":[{
93
+ "finish_reason":"stop",
94
+ "index":0,
95
+ "logprobs":null,
96
+ "message":{
97
+ "content":"<MODEL_REPLY_HERE">,
98
+ "role":"assistant"}}],
99
+ "created":null,
100
+ "id":"rkllm_chat",
101
+ "object":"rkllm_chat",
102
+ "usage":{
103
+ "completion_tokens":null,
104
+ "prompt_tokens":null,
105
+ "total_tokens":null}
106
+ }
107
+ ```
108
+
109
+ Example response:
110
+
111
+ ```json
112
+ {"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Napoleon Bonaparte (1769-1821) was a French military leader and emperor who rose to prominence during the French Revolution and went on to conquer much of Europe. His strategic brilliance and charisma earned him many followers, but his ambitions ultimately led to his downfall. After being exiled to the island of Saint Helena in the South Atlantic, he died in 1821 at the age of 51.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
113
+ ```
114
+
115
+ ### 4) UI compatibility
116
+
117
+ This server exposes an **OpenAI-compatible Chat Completions API**.
118
+
119
+ You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com))
120
+
121
+ - Configure your client with the API base: `http://<SERVER_IP_ADDRESS>:8080` and use the endpoint: `/rkllm_chat`
122
+ - Make sure the `model` field matches the converted model’s name, for example:
123
+
124
+ ```json
125
+ {
126
+ "model": "MiniCPM3-4B",
127
+ "messages": [{"role":"user","content":"Hello!"}],
128
+ "stream": false
129
+ }
130
+ ```
131
+
132
+ # License
133
+
134
+ This conversion follows the license of the source model: [apache-2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md)
135
+
136
+ - Attribution: **Built with MiniCPM3-4B (OpenBMB)**
137
+ - Required notice: see [`NOTICE`](NOTICE)
138
+ - Modifications: quantization (w8a8_g128), export to `.rkllm` format for RK3588 SBCs