LLM Evaluation Benchmarks - a Alanox Collection

Alanox 's Collections

LLM Evaluation Benchmarks

LLM Evaluation Benchmarks

updated Apr 7, 2025

This collection is here is make references to the evaluation benchmarks we see in traditional LLM papers

Running on CPU Upgrade

Agents

246

MMLU-Pro Leaderboard

🥇

246

More advanced and challenging multi-task evaluation
Running on CPU Upgrade

Agents

604

GAIA Leaderboard

🦾

604

Submit and score your model on the GAIA benchmark