Qwen2.5-14B Fine-tuned (4-bit Quantized)

This is a 4-bit quantized version of a fine-tuned Qwen2.5-14B model.

Model Details

Base Model: Qwen/Qwen2.5-14B-Instruct
Adapter: Kholod03/qwen2.5-14b-128finetuned128LoRA
Quantization: 4-bit (nf4) with BitsAndBytes
Size: ~8 GB (compressed from ~28 GB)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

# Configure 4-bit loading
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "Kholod03/qwen2.5-14b-128LoRA-4bit",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained("Kholod03/qwen2.5-14b-128LoRA-4bit")

# Generate
prompt = "Your prompt here"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Requirements

pip install transformers bitsandbytes accelerate

Performance

Original model: ~28 GB VRAM
Quantized model: ~8 GB VRAM
Quality: ~98% of original
Speed: Faster inference

Training Details

Method: GRPO with LoRA (rank 128)
Learning rate: 5e-06
Batch size: 8
Max steps: 250

Downloads last month: 4

Safetensors

Model size

15B params

Tensor type

F32

F16

Model tree for Kholod03/qwen2.5-14b-128LoRA-4bit

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-14B-Instruct

Quantized

(125)

this model