| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |----------------------------------------|------:|------|-----:|--------|---|-----:|---|-----:| |arc_challenge | 1|none | 0|acc |↑ |0.5256|± |0.0146| | | |none | 0|acc_norm|↑ |0.5708|± |0.0145| |arc_easy | 1|none | 0|acc |↑ |0.7971|± |0.0083| | | |none | 0|acc_norm|↑ |0.7761|± |0.0086| |boolq | 2|none | 0|acc |↑ |0.8899|± |0.0055| |hellaswag | 1|none | 0|acc |↑ |0.6331|± |0.0048| | | |none | 0|acc_norm|↑ |0.8204|± |0.0038| |mmlu | 2|none | |acc |↑ |0.7573|± |0.0035| |mmlu_humanities | 2|none | |acc |↑ |0.6837|± |0.0065| |mmlu_formal_logic | 1|none | 0|acc |↑ |0.7063|± |0.0407| |mmlu_high_school_european_history | 1|none | 0|acc |↑ |0.8061|± |0.0309| |mmlu_high_school_us_history | 1|none | 0|acc |↑ |0.8333|± |0.0262| |mmlu_high_school_world_history | 1|none | 0|acc |↑ |0.8397|± |0.0239| |mmlu_international_law | 1|none | 0|acc |↑ |0.7934|± |0.0370| |mmlu_jurisprudence | 1|none | 0|acc |↑ |0.8426|± |0.0352| |mmlu_logical_fallacies | 1|none | 0|acc |↑ |0.8466|± |0.0283| |mmlu_moral_disputes | 1|none | 0|acc |↑ |0.8150|± |0.0209| |mmlu_moral_scenarios | 1|none | 0|acc |↑ |0.5676|± |0.0166| |mmlu_philosophy | 1|none | 0|acc |↑ |0.8039|± |0.0226| |mmlu_prehistory | 1|none | 0|acc |↑ |0.8302|± |0.0209| |mmlu_professional_law | 1|none | 0|acc |↑ |0.5600|± |0.0127| |mmlu_world_religions | 1|none | 0|acc |↑ |0.7778|± |0.0319| |mmlu_other | 2|none | |acc |↑ |0.7815|± |0.0072| |mmlu_business_ethics | 1|none | 0|acc |↑ |0.8300|± |0.0378| |mmlu_clinical_knowledge | 1|none | 0|acc |↑ |0.7585|± |0.0263| |mmlu_college_medicine | 1|none | 0|acc |↑ |0.6879|± |0.0353| |mmlu_global_facts | 1|none | 0|acc |↑ |0.4900|± |0.0502| |mmlu_human_aging | 1|none | 0|acc |↑ |0.7713|± |0.0282| |mmlu_management | 1|none | 0|acc |↑ |0.8544|± |0.0349| |mmlu_marketing | 1|none | 0|acc |↑ |0.9316|± |0.0165| |mmlu_medical_genetics | 1|none | 0|acc |↑ |0.8000|± |0.0402| |mmlu_miscellaneous | 1|none | 0|acc |↑ |0.8748|± |0.0118| |mmlu_nutrition | 1|none | 0|acc |↑ |0.8007|± |0.0229| |mmlu_professional_accounting | 1|none | 0|acc |↑ |0.6844|± |0.0277| |mmlu_professional_medicine | 1|none | 0|acc |↑ |0.7316|± |0.0269| |mmlu_virology | 1|none | 0|acc |↑ |0.5783|± |0.0384| |mmlu_social_sciences | 2|none | |acc |↑ |0.8560|± |0.0063| |mmlu_econometrics | 1|none | 0|acc |↑ |0.7105|± |0.0427| |mmlu_high_school_geography | 1|none | 0|acc |↑ |0.8535|± |0.0252| |mmlu_high_school_government_and_politics| 1|none | 0|acc |↑ |0.9275|± |0.0187| |mmlu_high_school_macroeconomics | 1|none | 0|acc |↑ |0.8410|± |0.0185| |mmlu_high_school_microeconomics | 1|none | 0|acc |↑ |0.9202|± |0.0176| |mmlu_high_school_psychology | 1|none | 0|acc |↑ |0.8954|± |0.0131| |mmlu_human_sexuality | 1|none | 0|acc |↑ |0.8779|± |0.0287| |mmlu_professional_psychology | 1|none | 0|acc |↑ |0.8268|± |0.0153| |mmlu_public_relations | 1|none | 0|acc |↑ |0.7818|± |0.0396| |mmlu_security_studies | 1|none | 0|acc |↑ |0.8041|± |0.0254| |mmlu_sociology | 1|none | 0|acc |↑ |0.9005|± |0.0212| |mmlu_us_foreign_policy | 1|none | 0|acc |↑ |0.8500|± |0.0359| |mmlu_stem | 2|none | |acc |↑ |0.7469|± |0.0074| |mmlu_abstract_algebra | 1|none | 0|acc |↑ |0.6600|± |0.0476| |mmlu_anatomy | 1|none | 0|acc |↑ |0.6593|± |0.0409| |mmlu_astronomy | 1|none | 0|acc |↑ |0.9013|± |0.0243| |mmlu_college_biology | 1|none | 0|acc |↑ |0.7778|± |0.0348| |mmlu_college_chemistry | 1|none | 0|acc |↑ |0.4600|± |0.0501| |mmlu_college_computer_science | 1|none | 0|acc |↑ |0.7900|± |0.0409| |mmlu_college_mathematics | 1|none | 0|acc |↑ |0.5700|± |0.0498| |mmlu_college_physics | 1|none | 0|acc |↑ |0.5392|± |0.0496| |mmlu_computer_security | 1|none | 0|acc |↑ |0.9100|± |0.0288| |mmlu_conceptual_physics | 1|none | 0|acc |↑ |0.8511|± |0.0233| |mmlu_electrical_engineering | 1|none | 0|acc |↑ |0.7724|± |0.0349| |mmlu_elementary_mathematics | 1|none | 0|acc |↑ |0.8624|± |0.0177| |mmlu_high_school_biology | 1|none | 0|acc |↑ |0.8290|± |0.0214| |mmlu_high_school_chemistry | 1|none | 0|acc |↑ |0.5862|± |0.0347| |mmlu_high_school_computer_science | 1|none | 0|acc |↑ |0.9000|± |0.0302| |mmlu_high_school_mathematics | 1|none | 0|acc |↑ |0.5444|± |0.0304| |mmlu_high_school_physics | 1|none | 0|acc |↑ |0.7219|± |0.0366| |mmlu_high_school_statistics | 1|none | 0|acc |↑ |0.8194|± |0.0262| |mmlu_machine_learning | 1|none | 0|acc |↑ |0.7679|± |0.0401| |openbookqa | 1|none | 0|acc |↑ |0.3000|± |0.0205| | | |none | 0|acc_norm|↑ |0.4120|± |0.0220| |rte | 1|none | 0|acc |↑ |0.7401|± |0.0264| |winogrande | 1|none | 0|acc |↑ |0.7656|± |0.0119| | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |--------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.7573|± |0.0035| |mmlu_humanities | 2|none | |acc |↑ |0.6837|± |0.0065| |mmlu_other | 2|none | |acc |↑ |0.7815|± |0.0072| |mmlu_social_sciences| 2|none | |acc |↑ |0.8560|± |0.0063| |mmlu_stem | 2|none | |acc |↑ |0.7469|± |0.0074|