Qwen3-4B DPO Fine-tuned - Checkpoint 2/4
This model is checkpoint 2 of 4 from DPO (Direct Preference Optimization) fine-tuning of Qwen/Qwen3-4B-Base.
Format: Safetensors only (~8 GB) - optimized for vLLM inference.
Training Details
- Base Model: Qwen/Qwen3-4B-Base
- Training Method: DPO
- Checkpoint: 2/4
- Sequence Length: 2048
- Learning Rate: 5e-6
- Batch Size: 1 (micro) × 16 (gradient accumulation)
- Format: Safetensors (no optimizer states)
Usage with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"Daksh1/qwen3-4b-dpo-ckpt-2",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"Daksh1/qwen3-4b-dpo-ckpt-2",
trust_remote_code=True
)
# Generate
messages = [
{"role": "user", "content": "Your question here"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Usage with vLLM (Recommended)
from vllm import LLM, SamplingParams
# Initialize vLLM
llm = LLM(
model="Daksh1/qwen3-4b-dpo-ckpt-2",
trust_remote_code=True,
dtype="bfloat16",
max_model_len=2048
)
# Create sampling parameters
sampling_params = SamplingParams(
temperature=0.7,
top_p=0.9,
max_tokens=512
)
# Generate
prompts = ["Your question here"]
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
print(output.outputs[0].text)
Usage with vLLM (Local Path)
from vllm import LLM, SamplingParams
# Use local path
llm = LLM(
model="./saved_models/qwen3-4b-dpo-ckpt-2",
trust_remote_code=True,
dtype="bfloat16",
max_model_len=2048
)
Model Size
- Safetensors: ~8 GB (inference-ready)
- No optimizer states - 75% smaller than training checkpoints
Related Checkpoints
- Checkpoint 1:
Daksh1/qwen3-4b-dpo-ckpt-1 - Checkpoint 2:
Daksh1/qwen3-4b-dpo-ckpt-2 - Checkpoint 3:
Daksh1/qwen3-4b-dpo-ckpt-3 - Checkpoint 4:
Daksh1/qwen3-4b-dpo-ckpt-4
- Downloads last month
- 13
Model tree for Daksh1/qwen3-4b-dpo-ckpt-2
Base model
Qwen/Qwen3-4B-Base