This model is designed for easy, local inference on CPUs and GPUs using llama.cpp-based software like LM Studio and Ollama.

The model embodies a dominant, assertive, and creative persona for role-playing and storytelling. It was fine-tuned on a multi-turn conversational dataset to enhance its coherence and memory.

Model Details

Original LoRA Model: samunder12/llama-3.1-8b-roleplay-v3-lora
Quantization: Q4_K_M. This method provides an excellent balance between model size, performance, and VRAM/RAM usage.
Context Length: 4096 tokens.

Usage Instructions

LM Studio (Recommended)

Download and install LM Studio.
In the app, search for this model repo: samunder12/llama-3.1-8b-roleplay-v3-gguf.
Download the GGUF file listed in the "Files" tab.
Go to the Chat tab (💬 icon) and load the model you just downloaded.
CRITICAL: On the right-hand panel, under "Prompt Format", select the Llama 3 preset.
Set the Context Length (n_ctx) to 4096.
Use the "Role-Play" sampler settings below for best results.