Llama3.2-3B
	
Run Llama3.2-3B optimized for Qualcomm NPUs with nexaSDK.  
	
		
	
	
		Quickstart
	
- Install nexaSDK and create a free account at sdk.nexa.ai  
- Activate your device with your access token:nexa config set license '<access_token>'
 
- Run the model on Qualcomm NPU in one line:nexa infer NexaAI/Llama3.2-3B-NPU-Turbo
 
	
		
	
	
		Model Description
	
Llama3.2-3B is a 3-billion-parameter language model from Meta’s Llama 3.2 series.
It is designed to provide a balance of efficiency and capability, making it suitable for deployment on a wide range of devices while maintaining strong performance on core language understanding and generation tasks.
Trained on diverse, high-quality datasets, Llama3.2-3B supports multiple languages and is optimized for scalability, fine-tuning, and real-world applications.
	
		
	
	
		Features
	
- Lightweight yet capable: delivers strong performance with a smaller memory footprint.
- Conversational AI: context-aware dialogue for assistants and agents.
- Content generation: text completion, summarization, code comments, and more.
- Reasoning & analysis: step-by-step problem solving and explanation.
- Multilingual: supports understanding and generation in multiple languages.
- Customizable: can be fine-tuned for domain-specific or enterprise use.
	
		
	
	
		Use Cases
	
- Personal and enterprise chatbots
- On-device AI applications
- Document and report summarization
- Education and tutoring tools
- Specialized models in verticals (e.g., healthcare, finance, legal)
	
		
	
	
		Inputs and Outputs
	
Input:
- Text prompts or conversation history (tokenized input sequences).
Output:
- Generated text: responses, explanations, or creative content.
- Optionally: raw logits/probabilities for advanced downstream tasks.
	
		
	
	
		License
	
	
		
	
	
		References