Ms. Pac-Man GRPO-trained Strategy Generator

This model was trained using Group Relative Policy Optimization (GRPO) to generate Python strategies for playing Atari Ms. Pac-Man.

Downloads last month
33
Safetensors
Model size
21B params
Tensor type
BF16
·
Video Preview
loading

Model tree for justinj92/gpt-oss-20B-pacmanplayer

Base model

openai/gpt-oss-20b
Finetuned
(14)
this model