GGUF
conversational

Scored 65.00 on MMLU Pro single shot πŸ”₯

#2
by xbruce22 - opened

logs

+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Model                                        | Dataset   | Metric          | Subset           |   Num |   Score | Cat.0   |
+==============================================+===========+=================+==================+=======+=========+=========+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | computer science |    10 |    0.6  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | math             |    10 |    0.8  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | chemistry        |    10 |    0.7  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | engineering      |    10 |    0.7  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | law              |    10 |    0.3  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | biology          |    10 |    0.9  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | health           |    10 |    0.8  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | physics          |    10 |    0.6  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | business         |    10 |    0.6  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | philosophy       |    10 |    0.6  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | economics        |    10 |    0.8  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | other            |    10 |    0.6  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | psychology       |    10 |    0.6  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | history          |    10 |    0.5  | default |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+
| intel-Qwen3-Coder-30B-A3B-Instruct-gguf-q2ks | mmlu_pro  | AverageAccuracy | OVERALL          |   140 |    0.65 | -       |
+----------------------------------------------+-----------+-----------------+------------------+-------+---------+---------+

for comparison:

unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-GGUF
Score | Model | GGUF Size
63.57 Q4_K_M 18.6GB (unsloth)
65.71 Q4_K_M ~19GB (ollama)

from MMLU Pro benchmark

Intel org
β€’
edited Aug 5

for comparison:

unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-GGUF
Score | Model | GGUF Size
63.57 Q4_K_M 18.6GB (unsloth)
65.71 Q4_K_M ~19GB (ollama)

from MMLU Pro benchmark

Are the above two scores also coming from AVERAGE?

Yes OVERALL result (as you can see in above logs) across all subjects.

Sign up or log in to comment