--- library_name: rkllm pipeline_tag: text-generation license: apache-2.0 language: - en base_model: - TinyLlama/TinyLlama-1.1B-Chat-v1.0 tags: - rkllm - rk3588 - rockchip - edge-ai - llm - tinyllama - llama - text-generation-inference --- # TinyLlama-1.1B-Chat-v1.0 — RKLLM build for RK3588 boards ### Built with TinyLlama (Meta Platforms, Inc.) **Author:** @jamescallander **Source model:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0 · Hugging Face](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) **Target:** Rockchip RK3588 NPU via **RKNN-LLM Runtime** > This repository hosts a **conversion** of `TinyLlama-1.1B-Chat-v1.0` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com) #### Conversion details - **RKLLM-Toolkit version:** v1.2.1 - **NPU driver:** v0.9.8 - **Python:** 3.12 - **Quantization:** `w8a8_g128` - **Output:** single-file `.rkllm` artifact - **Modifications:** quantization (w8a8_g128), export to `.rkllm` format for RK3588 SBCs - ****Tokenizer:**** not required at runtime (UI handles prompt I/O) ## Intended use - On-device lightweight inference on RK3588 SBCs. - On-device chat and instruction-following on RK3588 SBCs. - TinyLlama-1.1B-Chat is tuned for conversational tasks and optimized to be lightweight — a good choice for **fast, low-resource inference** and testing pipelines where efficiency matters more than deep reasoning power. ## Limitations - Requires 1.5GB free memory - At 1.1B parameters, its reasoning and accuracy are **limited compared to larger models** (e.g., 7B/8B). - Tested on a Radxa Rock 5B+; other devices may require different drivers/toolkit versions. - Quantization (`w8a8_g128`) may further reduce output fidelity. - Best suited for **lightweight experimentation, demos, and basic Q&A** rather than production-grade tasks. ## Quick start (RK3588) ### 1) Install runtime The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip). Download and install the required packages as per the toolkit's instructions. ### 2) Simple Flask server deployment The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo` ```bash python3 /rknn-llm/examples/rkllm_server_demo/flask_server.py \ --rkllm_model_path /TinyLlama-1.1B-Chat-v1.0_w8a8_g128_rk3588.rkllm \ --target_platform rk3588 ``` ### 3) Sending a request A basic format for message request is: ```json { "model":"TinyLlama-1.1B-Chat-v1.0", "messages":[{ "role":"user", "content":""}], "stream":false } ``` Example request using `curl`: ```bash curl -s -X POST :8080/rkllm_chat \ -H 'Content-Type: application/json' \ -d '{"model":"TinyLlama-1.1B-Chat-v1.0","messages":[{"role":"user","content":"Explain who Napoleon Bonaparte is in two or three sentences."}],"stream":false}' ``` The response is formated in the following way: ```json { "choices":[{ "finish_reason":"stop", "index":0, "logprobs":null, "message":{ "content":", "role":"assistant"}}], "created":null, "id":"rkllm_chat", "object":"rkllm_chat", "usage":{ "completion_tokens":null, "prompt_tokens":null, "total_tokens":null} } ``` Example response: ```json {"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Napoleon Bonaparte was a French military leader and statesman who rose to prominence during the French Revolution. He played a pivotal role in shaping modern Europe through his military campaigns, administrative reforms, and the establishment of new political institutions.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}} ``` ### 4) UI compatibility This server exposes an **OpenAI-compatible Chat Completions API**. You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com)) - Configure your client with the API base: `http://:8080` and use the endpoint: `/rkllm_chat` - Make sure the `model` field matches the converted model’s name, for example: ```json { "model": "TinyLlama-1.1B-Chat-v1.0", "messages": [{"role":"user","content":"Hello!"}], "stream": false } ``` # License This conversion follows the license of the source model: [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0?utm_source=chatgpt.com) - **Attribution:** Built with TinyLlama (Apache-2.0 License). - **Required notice:** see [`NOTICE`](NOTICE) - **Modifications:** quantization (w8a8_g128), export to `.rkllm` format for RK3588 SBCs. Copyright (c) 2024 TinyLlama Contributors Licensed under the Apache License, Version 2.0