🚀 Qwen3-50M C4 Pretrained (FP16) - Notebook Version

Pretrained Qwen3-50M model on C4 dataset using FP16 precision in notebook environment.

📊 Training Results

Final Training Loss: 7.0744
Final Validation Loss: 7.1159162521362305
Training Samples: 10,000
Epochs: 2
Precision: FP16

🚀 Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("Mostafa8Mehrabi/qwen3-50m-c4-final_test_H200")
model = AutoModelForCausalLM.from_pretrained(
    "Mostafa8Mehrabi/qwen3-50m-c4-final_test_H200", 
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate text
prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📁 Checkpoints

Training checkpoints (also in FP16) are available at: Mostafa8Mehrabi/qwen3-50m-c4-checkpoints_test_H200

🔧 Training Environment

This model was trained in a notebook environment with the following configuration:

Batch Size: 160
Learning Rate: 5e-05
Max Length: 512
Number of Processes: 8

Downloads last month: 3

Safetensors

Model size

71.6M params

Tensor type

F16

Model tree for Mostafa8Mehrabi/qwen3-50m-c4-final_test_H200

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

Mostafa8Mehrabi/qwen3-50m-fp16

Finetuned

(4)

this model