Scored 65.00 on MMLU Pro single shot π₯
#2
by
xbruce22
- opened
logs
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Model | Dataset | Metric | Subset | Num | Score | Cat.0 |
+==============================================+===========+=================+==================+=======+=========+=========+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | computer science | 10 | 0.6 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | math | 10 | 0.8 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | chemistry | 10 | 0.7 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | engineering | 10 | 0.7 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | law | 10 | 0.3 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | biology | 10 | 0.9 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | health | 10 | 0.8 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | physics | 10 | 0.6 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | business | 10 | 0.6 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | philosophy | 10 | 0.6 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | economics | 10 | 0.8 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | other | 10 | 0.6 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | psychology | 10 | 0.6 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | history | 10 | 0.5 | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro | AverageAccuracy | OVERALL | 140 | 0.65 | - |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
for comparison:
unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-GGUF
Score | Model | GGUF Size
63.57 Q4_K_M 18.6GB (unsloth)
65.71 Q4_K_M ~19GB (ollama)
from MMLU Pro benchmark
for comparison:
unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-GGUF
Score | Model | GGUF Size
63.57 Q4_K_M 18.6GB (unsloth)
65.71 Q4_K_M ~19GB (ollama)from MMLU Pro benchmark
Are the above two scores also coming from AVERAGE?
Yes OVERALL result (as you can see in above logs) across all subjects.