burtenshaw HF Staff commited on
Commit
deb7026
·
verified ·
1 Parent(s): 6788aea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +195 -3
README.md CHANGED
@@ -1,3 +1,195 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - karpathy/fineweb-edu-100b-shuffle
5
+ language:
6
+ - en
7
+ model-index:
8
+ - name: chat-d10
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: AI2 Reasoning Challenge (25-Shot)
15
+ type: ai2_arc
16
+ config: ARC-Challenge
17
+ split: test
18
+ metrics:
19
+ - type: acc_norm
20
+ value: 29.61
21
+ name: normalized accuracy
22
+ source:
23
+ url: https://github.com/karpathy/nanochat
24
+ name: nanochat
25
+ - task:
26
+ type: text-generation
27
+ name: Text Generation
28
+ dataset:
29
+ name: AI2 Reasoning Challenge (25-Shot)
30
+ type: ai2_arc
31
+ config: ARC-Easy
32
+ split: test
33
+ metrics:
34
+ - type: acc_norm
35
+ value: 42.59
36
+ name: normalized accuracy
37
+ source:
38
+ url: https://github.com/karpathy/nanochat
39
+ name: nanochat
40
+ - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ name: MMLU (5-Shot)
45
+ type: cais/mmlu
46
+ config: all
47
+ split: test
48
+ metrics:
49
+ - type: acc
50
+ value: 32.50
51
+ name: accuracy
52
+ source:
53
+ url: https://github.com/karpathy/nanochat
54
+ name: nanochat
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: GSM8k (5-shot)
60
+ type: gsm8k
61
+ config: main
62
+ split: test
63
+ metrics:
64
+ - type: acc
65
+ value: 4.32
66
+ name: accuracy
67
+ source:
68
+ url: https://github.com/karpathy/nanochat
69
+ name: nanochat
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: HumanEval
75
+ type: openai_humaneval
76
+ split: test
77
+ metrics:
78
+ - type: pass@1
79
+ value: 5.49
80
+ name: pass@1
81
+ source:
82
+ url: https://github.com/karpathy/nanochat
83
+ name: nanochat
84
+ - task:
85
+ type: text-generation
86
+ name: Text Generation
87
+ dataset:
88
+ name: ChatCORE
89
+ type: chatcore
90
+ split: test
91
+ metrics:
92
+ - type: score
93
+ value: 9.88
94
+ name: ChatCORE metric
95
+ source:
96
+ url: https://github.com/karpathy/nanochat
97
+ name: nanochat
98
+ ---
99
+
100
+ # NanoChat SFT
101
+
102
+ This is the the checkpoint from [Andrej Karpathy's](https://huggingface.co/karpathy) fullstack llm project to build an LLM, [nanochat](https://github.com/karpathy/nanochat).
103
+
104
+ ## Usage
105
+
106
+ Install transformers from this specific branch:
107
+
108
+ ```sh
109
+ pip install git+https://github.com/huggingface/transformers.git@nanochat-implementation
110
+ ```
111
+
112
+ Then, you can run this inference snippet:
113
+
114
+ ```python
115
+ import torch
116
+ from transformers import AutoModelForCausalLM, AutoTokenizer
117
+
118
+
119
+ model_id="nanochat-students/d20-chat-transformers"
120
+ max_new_tokens=64
121
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
122
+
123
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False)
124
+ model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16).to(device)
125
+ model.eval()
126
+
127
+ conversation = [
128
+ {"role": "user", "content": "What is the capital of France?"},
129
+ ]
130
+
131
+ inputs = tokenizer.apply_chat_template(
132
+ conversation,
133
+ add_generation_prompt=True,
134
+ tokenize=True,
135
+ return_tensors="pt"
136
+ ).to(device)
137
+
138
+ with torch.no_grad():
139
+ outputs = model.generate(
140
+ inputs,
141
+ max_new_tokens=max_new_tokens,
142
+ )
143
+
144
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
145
+ ```
146
+
147
+
148
+ ## Chat SFT Training Metrics
149
+
150
+ timestamp: 2025-10-14 20:17:42
151
+
152
+ - run:
153
+ - source: mid
154
+ - dtype: bfloat16
155
+ - device_batch_size: 4
156
+ - num_epochs: 1
157
+ - max_iterations: -1
158
+ - target_examples_per_step: 32
159
+ - unembedding_lr: 0.0040
160
+ - embedding_lr: 0.2000
161
+ - matrix_lr: 0.0200
162
+ - weight_decay: 0.0000
163
+ - init_lr_frac: 0.0200
164
+ - eval_every: 100
165
+ - eval_steps: 100
166
+ - eval_metrics_every: 200
167
+ - Training rows: 20,843
168
+ - Number of iterations: 651
169
+ - Training loss: 1.1904
170
+ - Validation loss: 1.0664
171
+
172
+ ## Chat evaluation sft
173
+
174
+ timestamp: 2025-10-14 20:29:59
175
+
176
+ - source: sft
177
+ - task_name: None
178
+ - dtype: bfloat16
179
+ - temperature: 0.0000
180
+ - max_new_tokens: 512
181
+ - num_samples: 1
182
+ - top_k: 50
183
+ - batch_size: 8
184
+ - model_tag: None
185
+ - step: None
186
+ - max_problems: None
187
+ - ARC-Easy: 0.4259
188
+ - ARC-Challenge: 0.2961
189
+ - MMLU: 0.3250
190
+ - GSM8K: 0.0432
191
+ - HumanEval: 0.0549
192
+ - ChatCORE metric: 0.0988
193
+
194
+ Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio
195
+