nanochat-students
/

nanochat-d20

+---
+license: apache-2.0
+datasets:
+- karpathy/fineweb-edu-100b-shuffle
+language:
+- en
+model-index:
+- name: chat-d10
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Challenge
+      split: test
+    metrics:
+    - type: acc_norm
+      value: 29.61
+      name: normalized accuracy
+    source:
+      url: https://github.com/karpathy/nanochat
+      name: nanochat
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Easy
+      split: test
+    metrics:
+    - type: acc_norm
+      value: 42.59
+      name: normalized accuracy
+    source:
+      url: https://github.com/karpathy/nanochat
+      name: nanochat
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU (5-Shot)
+      type: cais/mmlu
+      config: all
+      split: test
+    metrics:
+    - type: acc
+      value: 32.50
+      name: accuracy
+    source:
+      url: https://github.com/karpathy/nanochat
+      name: nanochat
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GSM8k (5-shot)
+      type: gsm8k
+      config: main
+      split: test
+    metrics:
+    - type: acc
+      value: 4.32
+      name: accuracy
+    source:
+      url: https://github.com/karpathy/nanochat
+      name: nanochat
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HumanEval
+      type: openai_humaneval
+      split: test
+    metrics:
+    - type: pass@1
+      value: 5.49
+      name: pass@1
+    source:
+      url: https://github.com/karpathy/nanochat
+      name: nanochat
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: ChatCORE
+      type: chatcore
+      split: test
+    metrics:
+    - type: score
+      value: 9.88
+      name: ChatCORE metric
+    source:
+      url: https://github.com/karpathy/nanochat
+      name: nanochat
+---
+# NanoChat SFT
+This is the the checkpoint from [Andrej Karpathy's](https://huggingface.co/karpathy) fullstack llm project to build an LLM, [nanochat](https://github.com/karpathy/nanochat).
+## Usage
+Install transformers from this specific branch:
+```sh
+pip install git+https://github.com/huggingface/transformers.git@nanochat-implementation
+```
+Then, you can run this inference snippet:
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id="nanochat-students/d20-chat-transformers"
+max_new_tokens=64
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False)
+model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16).to(device)
+model.eval()
+conversation = [
+    {"role": "user", "content": "What is the capital of France?"},
+]
+inputs = tokenizer.apply_chat_template(
+    conversation,
+    add_generation_prompt=True,
+    tokenize=True,
+    return_tensors="pt"
+).to(device)
+with torch.no_grad():
+    outputs = model.generate(
+        inputs,
+        max_new_tokens=max_new_tokens,
+    )
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Chat SFT Training Metrics
+timestamp: 2025-10-14 20:17:42
+- run:
+- source: mid
+- dtype: bfloat16
+- device_batch_size: 4
+- num_epochs: 1
+- max_iterations: -1
+- target_examples_per_step: 32
+- unembedding_lr: 0.0040
+- embedding_lr: 0.2000
+- matrix_lr: 0.0200
+- weight_decay: 0.0000
+- init_lr_frac: 0.0200
+- eval_every: 100
+- eval_steps: 100
+- eval_metrics_every: 200
+- Training rows: 20,843
+- Number of iterations: 651
+- Training loss: 1.1904
+- Validation loss: 1.0664
+## Chat evaluation sft
+timestamp: 2025-10-14 20:29:59
+- source: sft
+- task_name: None
+- dtype: bfloat16
+- temperature: 0.0000
+- max_new_tokens: 512
+- num_samples: 1
+- top_k: 50
+- batch_size: 8
+- model_tag: None
+- step: None
+- max_problems: None
+- ARC-Easy: 0.4259
+- ARC-Challenge: 0.2961
+- MMLU: 0.3250
+- GSM8K: 0.0432
+- HumanEval: 0.0549
+- ChatCORE metric: 0.0988
+Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio