Peach

This model is designed for easy, local inference on CPUs and GPUs using llama.cpp-based software like LM Studio and Ollama.

The model embodies a dominant, assertive, and creative persona for role-playing and storytelling. It was fine-tuned on a multi-turn conversational dataset to enhance its coherence and memory.

Model Details

  • Original LoRA Model: samunder12/llama-3.1-8b-roleplay-v3-lora
  • Quantization: Q4_K_M. This method provides an excellent balance between model size, performance, and VRAM/RAM usage.
  • Context Length: 4096 tokens.

Usage Instructions

LM Studio (Recommended)

  1. Download and install LM Studio.
  2. In the app, search for this model repo: samunder12/llama-3.1-8b-roleplay-v3-gguf.
  3. Download the GGUF file listed in the "Files" tab.
  4. Go to the Chat tab (๐Ÿ’ฌ icon) and load the model you just downloaded.
  5. CRITICAL: On the right-hand panel, under "Prompt Format", select the Llama 3 preset.
  6. Set the Context Length (n_ctx) to 4096.
  7. Use the "Role-Play" sampler settings below for best results.

Recommended Sampler Settings (Role-Play Preset)

Setting Value
Temperature 0.75
Repeat Penalty 1.06
Mirostat Mirostat 2.0
top_p 0.92
top_k 40 or 100
Downloads last month
315
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for samunder12/llama-3.1-8b-roleplay-jio-gguf

Quantized
(511)
this model

Collection including samunder12/llama-3.1-8b-roleplay-jio-gguf