vuminhtue commited on
Commit
79ae26f
·
verified ·
1 Parent(s): 0aed5d0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +116 -0
README.md CHANGED
@@ -1,3 +1,119 @@
1
  ---
 
2
  license: mit
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
  license: mit
4
+ tags:
5
+ - pytorch
6
+ - text-generation
7
+ - qwen3
8
+ - tinystories
9
  ---
10
+
11
+ # Qwen3-0.6B Pre-trained on TinyStories
12
+
13
+ This is a Qwen3-0.6B model pre-trained on the TinyStories dataset for 200k iterations.
14
+
15
+ ## Model Details
16
+
17
+ - **Architecture**: Qwen3-0.6B
18
+ - **Training Data**: TinyStories dataset from HuggingFace
19
+ - **Training Iterations**: 200,000
20
+ - **Parameters**: ~596M unique parameters
21
+ - **Tokenizer**: GPT-2 tokenizer (tiktoken)
22
+ - **Training Loss**: Available in training history
23
+
24
+ ## Quick Start
25
+
26
+ ### Download the Model
27
+
28
+ ```python
29
+ from huggingface_hub import hf_hub_download
30
+ import torch
31
+
32
+ # Download model weights
33
+ model_path = hf_hub_download(
34
+ repo_id="vuminhtue/qwen3-200k-tinystories",
35
+ filename="Qwen3_200k_model_params.pt"
36
+ )
37
+
38
+ # Download config
39
+ config_path = hf_hub_download(
40
+ repo_id="vuminhtue/qwen3-200k-tinystories",
41
+ filename="config.json"
42
+ )
43
+ ```
44
+
45
+ ### Load and Use
46
+
47
+ ```python
48
+ import torch
49
+ import tiktoken
50
+ from Qwen3_model import Qwen3Model # You need this file from the original code
51
+
52
+ # Set up configuration
53
+ QWEN3_CONFIG = {
54
+ "vocab_size": 151936,
55
+ "context_length": 40960,
56
+ "emb_dim": 1024,
57
+ "n_heads": 16,
58
+ "n_layers": 28,
59
+ "hidden_dim": 3072,
60
+ "head_dim": 128,
61
+ "qk_norm": True,
62
+ "n_kv_groups": 8,
63
+ "rope_base": 1000000.0,
64
+ "dtype": torch.bfloat16,
65
+ }
66
+
67
+ # Load model
68
+ model = Qwen3Model(QWEN3_CONFIG)
69
+ device = "cuda" if torch.cuda.is_available() else "cpu"
70
+ model.load_state_dict(torch.load(model_path, map_location=device))
71
+ model = model.to(device)
72
+ model.eval()
73
+
74
+ # Generate text
75
+ tokenizer = tiktoken.get_encoding("gpt2")
76
+ # Your generation code here...
77
+ ```
78
+
79
+ ## Training Details
80
+
81
+ - **Optimizer**: AdamW with weight decay (0.1)
82
+ - **Learning Rate**: 1e-4 with warmup and cosine decay
83
+ - **Batch Size**: 32 with gradient accumulation (32 steps)
84
+ - **Context Length**: 128 tokens
85
+ - **Mixed Precision**: bfloat16 training
86
+
87
+ ## Model Architecture
88
+
89
+ - Grouped Query Attention (GQA) with 8 KV groups
90
+ - RoPE (Rotary Position Embeddings)
91
+ - RMSNorm for normalization
92
+ - SiLU activation function
93
+ - 28 transformer layers
94
+
95
+ ## Performance
96
+
97
+ The model was trained on TinyStories, a dataset of simple stories for children. It can generate coherent short stories in a similar style.
98
+
99
+ ## Citation
100
+
101
+ If you use this model, please cite:
102
+
103
+ ```bibtex
104
+ @misc{qwen3-tinystories-2025,
105
+ author = {Tue Vu},
106
+ title = {Qwen3-0.6B Pre-trained on TinyStories},
107
+ year = {2025},
108
+ publisher = {HuggingFace},
109
+ howpublished = {\url{https://huggingface.co/vuminhtue/qwen3-200k-tinystories}},
110
+ }
111
+ ```
112
+
113
+ ## License
114
+
115
+ MIT License
116
+
117
+ ## Contact
118
+
119
+ For questions or issues, please open an issue on the HuggingFace model page.