TinyLlama-1.1B-Chat-v1.0 — RKLLM build for RK3588 boards
Built with TinyLlama (Meta Platforms, Inc.)
Author: @jamescallander
Source model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 · Hugging Face
Target: Rockchip RK3588 NPU via RKNN-LLM Runtime
This repository hosts a conversion of
TinyLlama-1.1B-Chat-v1.0for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the RKNN-LLM toolkit
Conversion details
- RKLLM-Toolkit version: v1.2.1
- NPU driver: v0.9.8
- Python: 3.12
- Quantization:
w8a8_g128 - Output: single-file
.rkllmartifact - Modifications: quantization (w8a8_g128), export to
.rkllmformat for RK3588 SBCs - Tokenizer: not required at runtime (UI handles prompt I/O)
Intended use
- On-device lightweight inference on RK3588 SBCs.
- On-device chat and instruction-following on RK3588 SBCs.
- TinyLlama-1.1B-Chat is tuned for conversational tasks and optimized to be lightweight — a good choice for fast, low-resource inference and testing pipelines where efficiency matters more than deep reasoning power.
Limitations
- Requires 1.5GB free memory
- At 1.1B parameters, its reasoning and accuracy are limited compared to larger models (e.g., 7B/8B).
- Tested on a Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
- Quantization (
w8a8_g128) may further reduce output fidelity. - Best suited for lightweight experimentation, demos, and basic Q&A rather than production-grade tasks.
Quick start (RK3588)
1) Install runtime
The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from airockchip's github page.
Download and install the required packages as per the toolkit's instructions.
2) Simple Flask server deployment
The simplest way the deploy the .rkllm converted model is using an example script provided in the toolkit in this directory: rknn-llm/examples/rkllm_server_demo
python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
--rkllm_model_path <MODEL_PATH>/TinyLlama-1.1B-Chat-v1.0_w8a8_g128_rk3588.rkllm \
--target_platform rk3588
3) Sending a request
A basic format for message request is:
{
"model":"TinyLlama-1.1B-Chat-v1.0",
"messages":[{
"role":"user",
"content":"<YOUR_PROMPT_HERE>"}],
"stream":false
}
Example request using curl:
curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
-H 'Content-Type: application/json' \
-d '{"model":"TinyLlama-1.1B-Chat-v1.0","messages":[{"role":"user","content":"Explain who Napoleon Bonaparte is in two or three sentences."}],"stream":false}'
The response is formated in the following way:
{
"choices":[{
"finish_reason":"stop",
"index":0,
"logprobs":null,
"message":{
"content":"<MODEL_REPLY_HERE">,
"role":"assistant"}}],
"created":null,
"id":"rkllm_chat",
"object":"rkllm_chat",
"usage":{
"completion_tokens":null,
"prompt_tokens":null,
"total_tokens":null}
}
Example response:
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Napoleon Bonaparte was a French military leader and statesman who rose to prominence during the French Revolution. He played a pivotal role in shaping modern Europe through his military campaigns, administrative reforms, and the establishment of new political institutions.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
4) UI compatibility
This server exposes an OpenAI-compatible Chat Completions API.
You can connect it to any OpenAI-compatible client or UI (for example: Open WebUI)
- Configure your client with the API base:
http://<SERVER_IP_ADDRESS>:8080and use the endpoint:/rkllm_chat - Make sure the
modelfield matches the converted model’s name, for example:
{
"model": "TinyLlama-1.1B-Chat-v1.0",
"messages": [{"role":"user","content":"Hello!"}],
"stream": false
}
License
This conversion follows the license of the source model: Apache License 2.0
- Attribution: Built with TinyLlama (Apache-2.0 License).
- Required notice: see
NOTICE - Modifications: quantization (w8a8_g128), export to
.rkllmformat for RK3588 SBCs.
Copyright (c) 2024 TinyLlama Contributors Licensed under the Apache License, Version 2.0
- Downloads last month
- 25
Model tree for jamescallander/TinyLlama-1.1B-Chat-v1.0_w8a8_g128_rk3588.rkllm
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0