Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Alanox 's Collections
LLM Evaluation Benchmarks

LLM Evaluation Benchmarks

updated Apr 7, 2025

This collection is here is make references to the evaluation benchmarks we see in traditional LLM papers

Upvote
-

  • Running on CPU Upgrade
    Agents
    246

    MMLU-Pro Leaderboard

    🥇
    246

    More advanced and challenging multi-task evaluation


  • Running on CPU Upgrade
    Agents
    604

    GAIA Leaderboard

    🦾
    604

    Submit and score your model on the GAIA benchmark

Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs