--- license: apache-2.0 --- ### Accuracy
Category Metric meta-llama/Llama-4-Maverick-17B-128E-Instruct nm-testing/Llama-4-Maverick-17B-128E-Instruct-block-FP8 Recovery (%)
OpenLLM V1 ARC-Challenge (Acc-Norm, 25-shot) 73.38 73.38 100.00
GSM8K (Strict-Match, 5-shot) 93.03 92.72 99.67
HellaSwag (Acc-Norm, 10-shot) 87.39 87.33 99.93
MMLU (Acc, 5-shot) 86.03 86.15 100.13
TruthfulQA (MC2, 0-shot) 62.76 62.90 100.23
Winogrande (Acc, 5-shot) 79.56 79.40 99.80
Average Score 80.36 80.31 99.94
OpenLLM V2 IFEval (Inst Level Strict Acc, 0-shot) 89.93 90.89 101.07
BBH (Acc-Norm, 3-shot) 70.53 71.03 100.71
Math-Hard (Exact-Match, 4-shot) 64.73 65.26 100.82
GPQA (Acc-Norm, 0-shot) 31.29 30.54 97.59
MUSR (Acc-Norm, 0-shot) 46.56 46.03 98.86
MMLU-Pro (Acc, 5-shot) 64.11 63.95 99.75
Average Score 61.19 61.28 100.15
Coding HumanEval pass@1 abc 88.40 xyz
HumanEval+ pass@1 abc 79.30 xyz
MBPP pass@1 abc 90.20 xyz
MBPP+ pass@1 abc 75.10 xyz