File size: 1,944 Bytes
f851d28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
---
base_model:
- Sao10K/L3-8B-Stheno-v3.2
---
vllm (pretrained=/root/autodl-tmp/L3-8B-Stheno-v3.2,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.772|±  |0.0266|
|     |       |strict-match    |     5|exact_match|↑  |0.772|±  |0.0266|

vllm (pretrained=/root/autodl-tmp/L3-8B-Stheno-v3.2,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.790|±  |0.0182|
|     |       |strict-match    |     5|exact_match|↑  |0.796|±  |0.0180|


vllm (pretrained=/root/autodl-tmp/L3-8B-Stheno-v3.2-80,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.808|±  | 0.025|
|     |       |strict-match    |     5|exact_match|↑  |0.808|±  | 0.025|

vllm (pretrained=/root/autodl-tmp/L3-8B-Stheno-v3.2-80,add_bos_token=true,max_model_len=2048,dtype=bfloat16), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.816|±  |0.0173|
|     |       |strict-match    |     5|exact_match|↑  |0.822|±  |0.0171|