metadata
license: apache-2.0
Accuracy
| Category | Metric | meta-llama/Llama-4-Maverick-17B-128E-Instruct | nm-testing/Llama-4-Maverick-17B-128E-Instruct-block-FP8 | Recovery (%) |
|---|---|---|---|---|
| OpenLLM V1 | ARC-Challenge (Acc-Norm, 25-shot) | 73.38 | 73.38 | 100.00 |
| GSM8K (Strict-Match, 5-shot) | 93.03 | 92.72 | 99.67 | |
| HellaSwag (Acc-Norm, 10-shot) | 87.39 | 87.33 | 99.93 | |
| MMLU (Acc, 5-shot) | 86.03 | 86.15 | 100.13 | |
| TruthfulQA (MC2, 0-shot) | 62.76 | 62.90 | 100.23 | |
| Winogrande (Acc, 5-shot) | 79.56 | 79.40 | 99.80 | |
| Average Score | 80.36 | 80.31 | 99.94 | |
| OpenLLM V2 | IFEval (Inst Level Strict Acc, 0-shot) | 89.93 | 90.89 | 101.07 |
| BBH (Acc-Norm, 3-shot) | 70.53 | 71.03 | 100.71 | |
| Math-Hard (Exact-Match, 4-shot) | 64.73 | 65.26 | 100.82 | |
| GPQA (Acc-Norm, 0-shot) | 31.29 | 30.54 | 97.59 | |
| MUSR (Acc-Norm, 0-shot) | 46.56 | 46.03 | 98.86 | |
| MMLU-Pro (Acc, 5-shot) | 64.11 | 63.95 | 99.75 | |
| Average Score | 61.19 | 61.28 | 100.15 | |
| Coding | HumanEval pass@1 | abc | 88.40 | xyz |
| HumanEval+ pass@1 | abc | 79.30 | xyz | |
| MBPP pass@1 | abc | 90.20 | xyz | |
| MBPP+ pass@1 | abc | 75.10 | xyz |