Update README.md
Browse files
README.md
CHANGED
|
@@ -86,20 +86,6 @@ Details are in the paper’s Appendix.
|
|
| 86 |
## Evaluation
|
| 87 |
The model was evaluated using the setup described in the SwallowCode paper, with the lm-evaluation-harness and BigCodeBench. Benchmarks include code generation (HumanEval, HumanEval+) and general tasks (OpenBookQA, TriviaQA, HellaSwag, SQuAD 2.0, XWINO, MMLU, GSM8K, BBH). Results are reported for checkpoints at 10B, 20B, 30B, 40B, and 50B tokens.
|
| 88 |
|
| 89 |
-
Evaluation Results (Experiment 3)
|
| 90 |
-
|
| 91 |
-
### Evaluation Results (Experiment 3)
|
| 92 |
-
|
| 93 |
-
| Tokens (B) | OpenBookQA | TriviaQA | HellaSwag | SQuAD2.0 | XWINO | MMLU | GSM8K | BBH | HumanEval | HumanEval+ |
|
| 94 |
-
|------------|------------|----------|-----------|----------|-------|--------|--------|--------|-----------|------------|
|
| 95 |
-
| 10 | 0.3560 | 0.6628 | 0.6010 | 0.3340 | 0.9071| 0.6235 | 0.4564 | 0.6007 | 0.3500 | 0.3488 |
|
| 96 |
-
| 20 | 0.3500 | 0.6613 | 0.6015 | 0.3361 | 0.9054| 0.6237 | 0.4860 | 0.5838 | 0.3744 | 0.3787 |
|
| 97 |
-
| 30 | 0.3620 | 0.6596 | 0.6008 | 0.3359 | 0.9080| 0.6307 | 0.4867 | 0.5921 | 0.3957 | 0.3878 |
|
| 98 |
-
| 40 | 0.3720 | 0.6650 | 0.6030 | 0.3352 | 0.9058| 0.6326 | 0.4822 | 0.5990 | 0.3890 | 0.3915 |
|
| 99 |
-
| 50 | 0.3740 | 0.6677 | 0.6054 | 0.3291 | 0.9019| 0.6327 | 0.4996 | 0.6145 | 0.3945 | 0.3902 |
|
| 100 |
-
|
| 101 |
-
*Source: Table 4 from the SwallowCode paper, showing performance of the syntax-error and Pylint-filtered (score ≥ 7) Python subset.*
|
| 102 |
-
|
| 103 |
|
| 104 |
## Citation
|
| 105 |
|
|
|
|
| 86 |
## Evaluation
|
| 87 |
The model was evaluated using the setup described in the SwallowCode paper, with the lm-evaluation-harness and BigCodeBench. Benchmarks include code generation (HumanEval, HumanEval+) and general tasks (OpenBookQA, TriviaQA, HellaSwag, SQuAD 2.0, XWINO, MMLU, GSM8K, BBH). Results are reported for checkpoints at 10B, 20B, 30B, 40B, and 50B tokens.
|
| 88 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
## Citation
|
| 91 |
|