Qwen3-4B DPO Fine-tuned - Checkpoint 2/4

This model is checkpoint 2 of 4 from DPO (Direct Preference Optimization) fine-tuning of Qwen/Qwen3-4B-Base.

Format: Safetensors only (~8 GB) - optimized for vLLM inference.

Training Details

  • Base Model: Qwen/Qwen3-4B-Base
  • Training Method: DPO
  • Checkpoint: 2/4
  • Sequence Length: 2048
  • Learning Rate: 5e-6
  • Batch Size: 1 (micro) × 16 (gradient accumulation)
  • Format: Safetensors (no optimizer states)

Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "Daksh1/qwen3-4b-dpo-ckpt-2",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
    "Daksh1/qwen3-4b-dpo-ckpt-2",
    trust_remote_code=True
)

# Generate
messages = [
    {"role": "user", "content": "Your question here"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Usage with vLLM (Recommended)

from vllm import LLM, SamplingParams

# Initialize vLLM
llm = LLM(
    model="Daksh1/qwen3-4b-dpo-ckpt-2",
    trust_remote_code=True,
    dtype="bfloat16",
    max_model_len=2048
)

# Create sampling parameters
sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=512
)

# Generate
prompts = ["Your question here"]
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    print(output.outputs[0].text)

Usage with vLLM (Local Path)

from vllm import LLM, SamplingParams

# Use local path
llm = LLM(
    model="./saved_models/qwen3-4b-dpo-ckpt-2",
    trust_remote_code=True,
    dtype="bfloat16",
    max_model_len=2048
)

Model Size

  • Safetensors: ~8 GB (inference-ready)
  • No optimizer states - 75% smaller than training checkpoints

Related Checkpoints

  • Checkpoint 1: Daksh1/qwen3-4b-dpo-ckpt-1
  • Checkpoint 2: Daksh1/qwen3-4b-dpo-ckpt-2
  • Checkpoint 3: Daksh1/qwen3-4b-dpo-ckpt-3
  • Checkpoint 4: Daksh1/qwen3-4b-dpo-ckpt-4
Downloads last month
13
Safetensors
Model size
4B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Daksh1/qwen3-4b-dpo-ckpt-2

Base model

Qwen/Qwen3-4B-Base
Finetuned
(160)
this model