Add model-index with instruction benchmark evaluations

#45
by davidlms - opened

Added structured evaluation results from README benchmark table:

Instruction Model Benchmarks (No Extended Thinking):

  • AIME 2025 (High school math): 9.3
  • GSM-Plus (Math problem-solving): 72.8
  • LiveCodeBench v4 (Competitive programming): 15.2
  • GPQA Diamond (Graduate-level reasoning): 35.7
  • IFEval (Instruction following): 76.7
  • MixEval Hard (Alignment): 26.9
  • BFCL (Tool Calling): 92.3
  • Global MMLU (Multilingual Q&A): 53.5

Total: 8 benchmarks covering reasoning, math, coding, instruction-following, alignment, tool use, and multilingual capabilities.

This enables the model to appear in leaderboards and makes it easier to compare with other models.

Note: This PR adds benchmark metadata to the model card frontmatter and should not conflict with existing PRs #43, #32, and #16 which only modify the chat template.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment