nanochat model - RL stage
This is a nanochat model trained using Andrej Karpathy's nanochat.
Model Details
- Model type: GPT-style transformer
- Training stage: RL
- Parameters: 560,988,160
- Architecture:
- Layers: 20
- Embedding dim: 1280
- Attention heads: 10
- KV heads (GQA): 10
- Context length: 2048
- Vocab size: 65536
Training Info
- Training step: N/A
- Validation BPB: N/A
Architecture Highlights
- โ RoPE (Rotary Position Embeddings)
- โ QK Normalization
- โ ReLUยฒ activation (not GELU)
- โ Untied embeddings
- โ No bias terms
- โ Logit softcapping
- โ Group Query Attention (GQA)
Usage
This model uses a custom architecture from nanochat. To use it:
# Clone the nanochat repo
git clone https://github.com/karpathy/nanochat.git
cd nanochat
# Download this checkpoint
# Then load using nanochat's checkpoint manager
from nanochat.checkpoint_manager import load_model
from nanochat.engine import Engine
model, tokenizer, meta = load_model("rl", "cuda", phase="eval")
engine = Engine(model, tokenizer)
# Generate text
prompt = "The capital of France is"
tokens = tokenizer(prompt, prepend="<|bos|>")
completions, _ = engine.generate_batch(tokens, num_samples=1, max_tokens=50, temperature=0.7)
print(tokenizer.decode(completions[0]))
Special Tokens
The model uses these special tokens for chat:
<|bos|>- Beginning of sequence<|user_start|>,<|user_end|>- User messages<|assistant_start|>,<|assistant_end|>- Assistant messages<|python_start|>,<|python_end|>- Python tool calls<|output_start|>,<|output_end|>- Tool outputs
Citation
@misc{nanochat,
author = {Andrej Karpathy},
title = {nanochat: The best ChatGPT that $100 can buy},
year = {2025},
publisher = {GitHub},
url = {https://github.com/karpathy/nanochat}
}
License
MIT License - Same as the nanochat repository.
- Downloads last month
- 13
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support