Update README.md
Browse files
README.md
CHANGED
|
@@ -3,8 +3,8 @@ pipeline_tag: image-text-to-text
|
|
| 3 |
tags:
|
| 4 |
- NPU
|
| 5 |
---
|
| 6 |
-
# Qwen3-VL-4B-
|
| 7 |
-
Run **Qwen3-VL-4B-
|
| 8 |
|
| 9 |
## Quickstart
|
| 10 |
|
|
@@ -21,32 +21,32 @@ Run **Qwen3-VL-4B-Thinking** optimized for **Qualcomm NPUs** with [nexaSDK](http
|
|
| 21 |
```
|
| 22 |
|
| 23 |
## Model Description
|
| 24 |
-
**Qwen3-VL-4B-
|
| 25 |
-
|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
## Features
|
| 30 |
-
- **
|
| 31 |
-
- **
|
| 32 |
-
- **
|
| 33 |
-
- **
|
| 34 |
-
- **
|
| 35 |
|
| 36 |
## Use Cases
|
| 37 |
-
-
|
| 38 |
-
-
|
| 39 |
-
-
|
| 40 |
-
-
|
| 41 |
-
-
|
| 42 |
|
| 43 |
## Inputs and Outputs
|
| 44 |
**Input:**
|
| 45 |
-
- Text,
|
| 46 |
|
| 47 |
**Output:**
|
| 48 |
-
-
|
| 49 |
-
-
|
| 50 |
|
| 51 |
## License
|
| 52 |
-
|
|
|
|
| 3 |
tags:
|
| 4 |
- NPU
|
| 5 |
---
|
| 6 |
+
# Qwen3-VL-4B-Instruct
|
| 7 |
+
Run **Qwen3-VL-4B-Instruct** optimized for **Qualcomm NPUs** with [nexaSDK](https://sdk.nexa.ai).
|
| 8 |
|
| 9 |
## Quickstart
|
| 10 |
|
|
|
|
| 21 |
```
|
| 22 |
|
| 23 |
## Model Description
|
| 24 |
+
**Qwen3-VL-4B-Instruct** is a 4-billion-parameter instruction-tuned multimodal large language model from Alibaba Cloud’s Qwen team.
|
| 25 |
+
As part of the **Qwen3-VL** series, it fuses powerful vision-language understanding with conversational fine-tuning, optimized for real-world applications such as chat-based reasoning, document analysis, and visual dialogue.
|
| 26 |
|
| 27 |
+
The *Instruct* variant is tuned for following user prompts naturally and safely — producing concise, relevant, and user-aligned responses across text, image, and video contexts.
|
| 28 |
|
| 29 |
## Features
|
| 30 |
+
- **Instruction-Following**: Optimized for dialogue, explanation, and user-friendly task completion.
|
| 31 |
+
- **Vision-Language Fusion**: Understands and reasons across text, images, and video frames.
|
| 32 |
+
- **Multilingual Capability**: Handles multiple languages for diverse global use cases.
|
| 33 |
+
- **Contextual Coherence**: Balances reasoning ability with natural, grounded conversational tone.
|
| 34 |
+
- **Lightweight & Deployable**: 4B parameters make it efficient for edge and device-level inference.
|
| 35 |
|
| 36 |
## Use Cases
|
| 37 |
+
- Visual chatbots and assistants
|
| 38 |
+
- Image captioning and scene understanding
|
| 39 |
+
- Chart, document, or screenshot analysis
|
| 40 |
+
- Educational or tutoring systems with visual inputs
|
| 41 |
+
- Multilingual, multimodal question answering
|
| 42 |
|
| 43 |
## Inputs and Outputs
|
| 44 |
**Input:**
|
| 45 |
+
- Text prompts, image(s), or mixed multimodal instructions.
|
| 46 |
|
| 47 |
**Output:**
|
| 48 |
+
- Natural-language responses or visual reasoning explanations.
|
| 49 |
+
- Can return structured text (summaries, captions, answers, etc.) depending on the prompt.
|
| 50 |
|
| 51 |
## License
|
| 52 |
+
Refer to the [official Qwen license](https://huggingface.co/Qwen) for terms of use and redistribution.
|