Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
RLAIF
/
dpo_thinking_reddit_judge_last_minute_150_1e-6_0.02_4B_4B
like
0
Follow
RLAIF
20
Safetensors
Model card
Files
Files and versions
xet
Community
main
dpo_thinking_reddit_judge_last_minute_150_1e-6_0.02_4B_4B
8.84 GB
1 contributor
History:
2 commits
AngelRaychev
Upload folder using huggingface_hub
a15001d
verified
2 months ago
global_step_260
Upload folder using huggingface_hub
2 months ago
.gitattributes
1.59 kB
Upload folder using huggingface_hub
2 months ago