--- language: en license: mit tags: - pytorch - text-generation - qwen3 - tinystories --- # Qwen3-0.6B Pre-trained on TinyStories This is a Qwen3-0.6B model pre-trained on the TinyStories dataset for 200k iterations. ## Model Details - **Architecture**: Qwen3-0.6B - **Training Data**: TinyStories dataset from HuggingFace - **Training Iterations**: 200,000 - **Parameters**: ~596M unique parameters - **Tokenizer**: GPT-2 tokenizer (tiktoken) - **Training Loss**: Available in training history ## Quick Start ### Download the Model ```python from huggingface_hub import hf_hub_download import torch # Download model weights model_path = hf_hub_download( repo_id="vuminhtue/qwen3-200k-tinystories", filename="Qwen3_200k_model_params.pt" ) # Download config config_path = hf_hub_download( repo_id="vuminhtue/qwen3-200k-tinystories", filename="config.json" ) ``` ### Load and Use ```python import torch import tiktoken from Qwen3_model import Qwen3Model # You need this file from the original code # Set up configuration QWEN3_CONFIG = { "vocab_size": 151936, "context_length": 40960, "emb_dim": 1024, "n_heads": 16, "n_layers": 28, "hidden_dim": 3072, "head_dim": 128, "qk_norm": True, "n_kv_groups": 8, "rope_base": 1000000.0, "dtype": torch.bfloat16, } # Load model model = Qwen3Model(QWEN3_CONFIG) device = "cuda" if torch.cuda.is_available() else "cpu" model.load_state_dict(torch.load(model_path, map_location=device)) model = model.to(device) model.eval() # Generate text tokenizer = tiktoken.get_encoding("gpt2") # Your generation code here... ``` ## Training Details - **Optimizer**: AdamW with weight decay (0.1) - **Learning Rate**: 1e-4 with warmup and cosine decay - **Batch Size**: 32 with gradient accumulation (32 steps) - **Context Length**: 128 tokens - **Mixed Precision**: bfloat16 training ## Model Architecture - Grouped Query Attention (GQA) with 8 KV groups - RoPE (Rotary Position Embeddings) - RMSNorm for normalization - SiLU activation function - 28 transformer layers ## Performance The model was trained on TinyStories, a dataset of simple stories for children. It can generate coherent short stories in a similar style. ## Citation If you use this model, please cite: ```bibtex @misc{qwen3-tinystories-2025, author = {Tue Vu}, title = {Qwen3-0.6B Pre-trained on TinyStories}, year = {2025}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/vuminhtue/qwen3-200k-tinystories}}, } ``` ## License MIT License ## Contact For questions or issues, please open an issue on the HuggingFace model page.