nanochat model - RL stage

This is a nanochat model trained using Andrej Karpathy's nanochat.

Model Details

Model type: GPT-style transformer
Training stage: RL
Parameters: 560,988,160
Architecture:
- Layers: 20
- Embedding dim: 1280
- Attention heads: 10
- KV heads (GQA): 10
- Context length: 2048
- Vocab size: 65536

Training Info

Training step: N/A
Validation BPB: N/A

Architecture Highlights

✅ RoPE (Rotary Position Embeddings)
✅ QK Normalization
✅ ReLU² activation (not GELU)
✅ Untied embeddings
✅ No bias terms
✅ Logit softcapping
✅ Group Query Attention (GQA)

Usage

This model uses a custom architecture from nanochat. To use it:

# Clone the nanochat repo
git clone https://github.com/karpathy/nanochat.git
cd nanochat

# Download this checkpoint
# Then load using nanochat's checkpoint manager
from nanochat.checkpoint_manager import load_model
from nanochat.engine import Engine

model, tokenizer, meta = load_model("rl", "cuda", phase="eval")
engine = Engine(model, tokenizer)

# Generate text
prompt = "The capital of France is"
tokens = tokenizer(prompt, prepend="<|bos|>")
completions, _ = engine.generate_batch(tokens, num_samples=1, max_tokens=50, temperature=0.7)
print(tokenizer.decode(completions[0]))

Special Tokens

The model uses these special tokens for chat:

<|bos|> - Beginning of sequence
<|user_start|>, <|user_end|> - User messages
<|assistant_start|>, <|assistant_end|> - Assistant messages
<|python_start|>, <|python_end|> - Python tool calls
<|output_start|>, <|output_end|> - Tool outputs

Citation

@misc{nanochat,
  author = {Andrej Karpathy},
  title = {nanochat: The best ChatGPT that $100 can buy},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/karpathy/nanochat}
}

License

MIT License - Same as the nanochat repository.

Downloads last month: 13

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support