Llama3.2-3B

Run Llama3.2-3B optimized for Qualcomm NPUs with nexaSDK.

Quickstart

Install nexaSDK and create a free account at sdk.nexa.ai

Activate your device with your access token:

nexa config set license '<access_token>'

Run the model on Qualcomm NPU in one line:
```
nexa infer NexaAI/Llama3.2-3B-NPU-Turbo
```

Model Description

Llama3.2-3B is a 3-billion-parameter language model from Meta’s Llama 3.2 series.
It is designed to provide a balance of efficiency and capability, making it suitable for deployment on a wide range of devices while maintaining strong performance on core language understanding and generation tasks.

Trained on diverse, high-quality datasets, Llama3.2-3B supports multiple languages and is optimized for scalability, fine-tuning, and real-world applications.

Features

Lightweight yet capable: delivers strong performance with a smaller memory footprint.
Conversational AI: context-aware dialogue for assistants and agents.
Content generation: text completion, summarization, code comments, and more.
Reasoning & analysis: step-by-step problem solving and explanation.
Multilingual: supports understanding and generation in multiple languages.
Customizable: can be fine-tuned for domain-specific or enterprise use.

Use Cases

Personal and enterprise chatbots
On-device AI applications
Document and report summarization
Education and tutoring tools
Specialized models in verticals (e.g., healthcare, finance, legal)

Inputs and Outputs

Input:

Text prompts or conversation history (tokenized input sequences).

Output:

Generated text: responses, explanations, or creative content.
Optionally: raw logits/probabilities for advanced downstream tasks.

License

Licensed under: Meta Llama 3.2 License

References

Downloads last month: 78

Collection including NexaAI/Llama3.2-3B-NPU-Turbo

Qualcomm NPU

Collection

Latest SOTA models supported on Qualcomm NPU. • 17 items • Updated 1 day ago • 3