Qwen3-4B DPO Fine-tuned - Checkpoint 2/4

This model is checkpoint 2 of 4 from DPO (Direct Preference Optimization) fine-tuning of Qwen/Qwen3-4B-Base.

Format: Safetensors only (~8 GB) - optimized for vLLM inference.

Training Details

Base Model: Qwen/Qwen3-4B-Base
Training Method: DPO
Checkpoint: 2/4
Sequence Length: 2048
Learning Rate: 5e-6
Batch Size: 1 (micro) × 16 (gradient accumulation)
Format: Safetensors (no optimizer states)

Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "Daksh1/qwen3-4b-dpo-ckpt-2",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
    "Daksh1/qwen3-4b-dpo-ckpt-2",
    trust_remote_code=True
)

# Generate
messages = [
    {"role": "user", "content": "Your question here"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Usage with vLLM (Recommended)

from vllm import LLM, SamplingParams

# Initialize vLLM
llm = LLM(
    model="Daksh1/qwen3-4b-dpo-ckpt-2",
    trust_remote_code=True,
    dtype="bfloat16",
    max_model_len=2048
)

# Create sampling parameters
sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=512
)

# Generate
prompts = ["Your question here"]
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    print(output.outputs[0].text)

Usage with vLLM (Local Path)

from vllm import LLM, SamplingParams

# Use local path
llm = LLM(
    model="./saved_models/qwen3-4b-dpo-ckpt-2",
    trust_remote_code=True,
    dtype="bfloat16",
    max_model_len=2048
)

Model Size

Safetensors: ~8 GB (inference-ready)
No optimizer states - 75% smaller than training checkpoints

Related Checkpoints

Checkpoint 1: Daksh1/qwen3-4b-dpo-ckpt-1
Checkpoint 2: Daksh1/qwen3-4b-dpo-ckpt-2
Checkpoint 3: Daksh1/qwen3-4b-dpo-ckpt-3
Checkpoint 4: Daksh1/qwen3-4b-dpo-ckpt-4

Downloads last month: 13

Safetensors

Model size

4B params

Tensor type

F32

BF16

Model tree for Daksh1/qwen3-4b-dpo-ckpt-2

Base model

Qwen/Qwen3-4B-Base

Finetuned

(160)

this model