nanochat model - RL stage

This is a nanochat model trained using Andrej Karpathy's nanochat.

Model Details

  • Model type: GPT-style transformer
  • Training stage: RL
  • Parameters: 560,988,160
  • Architecture:
    • Layers: 20
    • Embedding dim: 1280
    • Attention heads: 10
    • KV heads (GQA): 10
    • Context length: 2048
    • Vocab size: 65536

Training Info

  • Training step: N/A
  • Validation BPB: N/A

Architecture Highlights

  • โœ… RoPE (Rotary Position Embeddings)
  • โœ… QK Normalization
  • โœ… ReLUยฒ activation (not GELU)
  • โœ… Untied embeddings
  • โœ… No bias terms
  • โœ… Logit softcapping
  • โœ… Group Query Attention (GQA)

Usage

This model uses a custom architecture from nanochat. To use it:

# Clone the nanochat repo
git clone https://github.com/karpathy/nanochat.git
cd nanochat

# Download this checkpoint
# Then load using nanochat's checkpoint manager
from nanochat.checkpoint_manager import load_model
from nanochat.engine import Engine

model, tokenizer, meta = load_model("rl", "cuda", phase="eval")
engine = Engine(model, tokenizer)

# Generate text
prompt = "The capital of France is"
tokens = tokenizer(prompt, prepend="<|bos|>")
completions, _ = engine.generate_batch(tokens, num_samples=1, max_tokens=50, temperature=0.7)
print(tokenizer.decode(completions[0]))

Special Tokens

The model uses these special tokens for chat:

  • <|bos|> - Beginning of sequence
  • <|user_start|>, <|user_end|> - User messages
  • <|assistant_start|>, <|assistant_end|> - Assistant messages
  • <|python_start|>, <|python_end|> - Python tool calls
  • <|output_start|>, <|output_end|> - Tool outputs

Citation

@misc{nanochat,
  author = {Andrej Karpathy},
  title = {nanochat: The best ChatGPT that $100 can buy},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/karpathy/nanochat}
}

License

MIT License - Same as the nanochat repository.

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support