Qwen2.5-14B Fine-tuned (4-bit Quantized)

This is a 4-bit quantized version of a fine-tuned Qwen2.5-14B model.

Model Details

  • Base Model: Qwen/Qwen2.5-14B-Instruct
  • Adapter: Kholod03/qwen2.5-14b-128finetuned128LoRA
  • Quantization: 4-bit (nf4) with BitsAndBytes
  • Size: ~8 GB (compressed from ~28 GB)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

# Configure 4-bit loading
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "Kholod03/qwen2.5-14b-128LoRA-4bit",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained("Kholod03/qwen2.5-14b-128LoRA-4bit")

# Generate
prompt = "Your prompt here"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Requirements

pip install transformers bitsandbytes accelerate

Performance

  • Original model: ~28 GB VRAM
  • Quantized model: ~8 GB VRAM
  • Quality: ~98% of original
  • Speed: Faster inference

Training Details

  • Method: GRPO with LoRA (rank 128)
  • Learning rate: 5e-06
  • Batch size: 8
  • Max steps: 250
Downloads last month
4
Safetensors
Model size
15B params
Tensor type
F32
F16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for Kholod03/qwen2.5-14b-128LoRA-4bit

Base model

Qwen/Qwen2.5-14B
Quantized
(125)
this model