Qwen2.5-14B Fine-tuned (4-bit Quantized)
This is a 4-bit quantized version of a fine-tuned Qwen2.5-14B model.
Model Details
- Base Model: Qwen/Qwen2.5-14B-Instruct
- Adapter: Kholod03/qwen2.5-14b-128finetuned128LoRA
- Quantization: 4-bit (nf4) with BitsAndBytes
- Size: ~8 GB (compressed from ~28 GB)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
# Configure 4-bit loading
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
# Load model
model = AutoModelForCausalLM.from_pretrained(
"Kholod03/qwen2.5-14b-128LoRA-4bit",
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("Kholod03/qwen2.5-14b-128LoRA-4bit")
# Generate
prompt = "Your prompt here"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Requirements
pip install transformers bitsandbytes accelerate
Performance
- Original model: ~28 GB VRAM
- Quantized model: ~8 GB VRAM
- Quality: ~98% of original
- Speed: Faster inference
Training Details
- Method: GRPO with LoRA (rank 128)
- Learning rate: 5e-06
- Batch size: 8
- Max steps: 250
- Downloads last month
- 4