Add model-index with instruction benchmark evaluations
#45
by
davidlms
- opened
Added structured evaluation results from README benchmark table:
Instruction Model Benchmarks (No Extended Thinking):
- AIME 2025 (High school math): 9.3
- GSM-Plus (Math problem-solving): 72.8
- LiveCodeBench v4 (Competitive programming): 15.2
- GPQA Diamond (Graduate-level reasoning): 35.7
- IFEval (Instruction following): 76.7
- MixEval Hard (Alignment): 26.9
- BFCL (Tool Calling): 92.3
- Global MMLU (Multilingual Q&A): 53.5
Total: 8 benchmarks covering reasoning, math, coding, instruction-following, alignment, tool use, and multilingual capabilities.
This enables the model to appear in leaderboards and makes it easier to compare with other models.
Note: This PR adds benchmark metadata to the model card frontmatter and should not conflict with existing PRs #43, #32, and #16 which only modify the chat template.