---
library_name: rkllm
pipeline_tag: text-generation
license: apache-2.0
language:
- en
base_model:
- TinyLlama/TinyLlama-1.1B-Chat-v1.0
tags:
- rkllm
- rk3588
- rockchip
- edge-ai
- llm
- tinyllama
- llama
- text-generation-inference
---
# TinyLlama-1.1B-Chat-v1.0 — RKLLM build for RK3588 boards

### Built with TinyLlama (Meta Platforms, Inc.)

**Author:** @jamescallander  
**Source model:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0 · Hugging Face](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)

**Target:** Rockchip RK3588 NPU via **RKNN-LLM Runtime**

> This repository hosts a **conversion** of `TinyLlama-1.1B-Chat-v1.0` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com)

#### Conversion details

- **RKLLM-Toolkit version:** v1.2.1
- **NPU driver:** v0.9.8
- **Python:** 3.12
- **Quantization:** `w8a8_g128`
- **Output:** single-file `.rkllm` artifact
- **Modifications:** quantization (w8a8_g128), export to `.rkllm` format for RK3588 SBCs
- ****Tokenizer:**** not required at runtime (UI handles prompt I/O)

## Intended use

- On-device lightweight inference on RK3588 SBCs.
- On-device chat and instruction-following on RK3588 SBCs.
- TinyLlama-1.1B-Chat is tuned for conversational tasks and optimized to be lightweight — a good choice for **fast, low-resource inference** and testing pipelines where efficiency matters more than deep reasoning power.

## Limitations

- Requires 1.5GB free memory
- At 1.1B parameters, its reasoning and accuracy are **limited compared to larger models** (e.g., 7B/8B).
- Tested on a Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
- Quantization (`w8a8_g128`) may further reduce output fidelity.
- Best suited for **lightweight experimentation, demos, and basic Q&A** rather than production-grade tasks.

## Quick start (RK3588)

### 1) Install runtime

The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip).

Download and install the required packages as per the toolkit's instructions.

### 2) Simple Flask server deployment

The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo`

```bash
python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
  --rkllm_model_path <MODEL_PATH>/TinyLlama-1.1B-Chat-v1.0_w8a8_g128_rk3588.rkllm \
  --target_platform rk3588
```

### 3) Sending a request

A basic format for message request is:

```json
{
    "model":"TinyLlama-1.1B-Chat-v1.0",
    "messages":[{
        "role":"user",
        "content":"<YOUR_PROMPT_HERE>"}],
    "stream":false
}
```

Example request using `curl`:

```bash
curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
    -H 'Content-Type: application/json' \
    -d '{"model":"TinyLlama-1.1B-Chat-v1.0","messages":[{"role":"user","content":"Explain who Napoleon Bonaparte is in two or three sentences."}],"stream":false}'
```

The response is formated in the following way:

```json
{
    "choices":[{
        "finish_reason":"stop",
        "index":0,
        "logprobs":null,
        "message":{
            "content":"<MODEL_REPLY_HERE">,
            "role":"assistant"}}],
        "created":null,
        "id":"rkllm_chat",
        "object":"rkllm_chat",
        "usage":{
            "completion_tokens":null,
            "prompt_tokens":null,
            "total_tokens":null}
}
```

Example response:

```json
{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Napoleon Bonaparte was a French military leader and statesman who rose to prominence during the French Revolution. He played a pivotal role in shaping modern Europe through his military campaigns, administrative reforms, and the establishment of new political institutions.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
```

### 4) UI compatibility

This server exposes an **OpenAI-compatible Chat Completions API**.

You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com))
- Configure your client with the API base: `http://<SERVER_IP_ADDRESS>:8080` and use the endpoint: `/rkllm_chat`
- Make sure the `model` field matches the converted model’s name, for example:

```json
{
 "model": "TinyLlama-1.1B-Chat-v1.0",
 "messages": [{"role":"user","content":"Hello!"}],
 "stream": false
}
```

# License

This conversion follows the license of the source model: [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0?utm_source=chatgpt.com)
- **Attribution:** Built with TinyLlama (Apache-2.0 License).
- **Required notice:** see [`NOTICE`](NOTICE)
- **Modifications:** quantization (w8a8_g128), export to `.rkllm` format for RK3588 SBCs.


Copyright (c) 2024 TinyLlama Contributors
Licensed under the Apache License, Version 2.0