yalhessi commited on
Commit
5219d17
·
verified ·
1 Parent(s): 5b8220e

End of training

Browse files
Files changed (1) hide show
  1. README.md +68 -69
README.md CHANGED
@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 0.1610
20
 
21
  ## Model description
22
 
@@ -36,14 +36,13 @@ More information needed
36
 
37
  The following hyperparameters were used during training:
38
  - learning_rate: 0.0008
39
- - train_batch_size: 1
40
- - eval_batch_size: 1
41
  - seed: 42
42
  - distributed_type: multi-GPU
43
  - num_devices: 8
44
- - gradient_accumulation_steps: 2
45
  - total_train_batch_size: 16
46
- - total_eval_batch_size: 8
47
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
  - lr_scheduler_type: linear
49
  - num_epochs: 12
@@ -53,71 +52,71 @@ The following hyperparameters were used during training:
53
 
54
  | Training Loss | Epoch | Step | Validation Loss |
55
  |:-------------:|:-------:|:------:|:---------------:|
56
- | 0.3527 | 0.2000 | 3118 | 0.3371 |
57
- | 0.3311 | 0.4001 | 6236 | 0.3145 |
58
- | 0.3178 | 0.6001 | 9354 | 0.3085 |
59
- | 0.3122 | 0.8002 | 12472 | 0.3022 |
60
- | 0.3106 | 1.0002 | 15590 | 0.3045 |
61
- | 0.2997 | 1.2002 | 18708 | 0.2830 |
62
- | 0.2957 | 1.4003 | 21826 | 0.2841 |
63
- | 0.2922 | 1.6003 | 24944 | 0.2769 |
64
- | 0.2895 | 1.8004 | 28062 | 0.2766 |
65
- | 0.2847 | 2.0004 | 31180 | 0.2660 |
66
- | 0.2826 | 2.2004 | 34298 | 0.2733 |
67
- | 0.2767 | 2.4005 | 37416 | 0.2659 |
68
- | 0.2737 | 2.6005 | 40534 | 0.2638 |
69
- | 0.2734 | 2.8006 | 43652 | 0.2642 |
70
- | 0.2701 | 3.0006 | 46770 | 0.2622 |
71
- | 0.2629 | 3.2006 | 49888 | 0.2548 |
72
- | 0.2652 | 3.4007 | 53006 | 0.2511 |
73
- | 0.263 | 3.6007 | 56124 | 0.2564 |
74
- | 0.2612 | 3.8008 | 59242 | 0.2506 |
75
- | 0.2616 | 4.0008 | 62360 | 0.2440 |
76
- | 0.2574 | 4.2008 | 65478 | 0.2404 |
77
- | 0.2501 | 4.4009 | 68596 | 0.2401 |
78
- | 0.2506 | 4.6009 | 71714 | 0.2414 |
79
- | 0.251 | 4.8009 | 74832 | 0.2378 |
80
- | 0.2456 | 5.0010 | 77950 | 0.2328 |
81
- | 0.2418 | 5.2010 | 81068 | 0.2376 |
82
- | 0.2395 | 5.4011 | 84186 | 0.2342 |
83
- | 0.2364 | 5.6011 | 87304 | 0.2241 |
84
- | 0.2334 | 5.8011 | 90422 | 0.2298 |
85
- | 0.2309 | 6.0012 | 93540 | 0.2240 |
86
- | 0.2291 | 6.2012 | 96658 | 0.2199 |
87
- | 0.2283 | 6.4012 | 99776 | 0.2145 |
88
- | 0.2208 | 6.6013 | 102894 | 0.2171 |
89
- | 0.2236 | 6.8013 | 106012 | 0.2127 |
90
- | 0.2208 | 7.0013 | 109130 | 0.2112 |
91
- | 0.2172 | 7.2014 | 112248 | 0.2100 |
92
- | 0.212 | 7.4014 | 115366 | 0.2044 |
93
- | 0.2111 | 7.6015 | 118484 | 0.2064 |
94
- | 0.2115 | 7.8015 | 121602 | 0.2003 |
95
- | 0.2102 | 8.0015 | 124720 | 0.2005 |
96
- | 0.2028 | 8.2016 | 127838 | 0.1959 |
97
- | 0.2011 | 8.4016 | 130956 | 0.1947 |
98
- | 0.1967 | 8.6017 | 134074 | 0.1941 |
99
- | 0.1954 | 8.8017 | 137192 | 0.1907 |
100
- | 0.1951 | 9.0017 | 140310 | 0.1871 |
101
- | 0.1927 | 9.2018 | 143428 | 0.1863 |
102
- | 0.1847 | 9.4018 | 146546 | 0.1823 |
103
- | 0.1879 | 9.6019 | 149664 | 0.1813 |
104
- | 0.1844 | 9.8019 | 152782 | 0.1786 |
105
- | 0.1881 | 10.0019 | 155900 | 0.1771 |
106
- | 0.1738 | 10.2020 | 159018 | 0.1746 |
107
- | 0.1753 | 10.4020 | 162136 | 0.1737 |
108
- | 0.1718 | 10.6021 | 165254 | 0.1704 |
109
- | 0.1714 | 10.8021 | 168372 | 0.1677 |
110
- | 0.1699 | 11.0021 | 171490 | 0.1688 |
111
- | 0.1629 | 11.2022 | 174608 | 0.1653 |
112
- | 0.1598 | 11.4022 | 177726 | 0.1642 |
113
- | 0.158 | 11.6023 | 180844 | 0.1643 |
114
- | 0.1591 | 11.8023 | 183962 | 0.1610 |
115
 
116
 
117
  ### Framework versions
118
 
119
- - PEFT 0.15.2
120
- - Transformers 4.57.1
121
- - Pytorch 2.7.0+cu126
122
- - Datasets 4.3.0
123
- - Tokenizers 0.22.1
 
16
 
17
  This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 0.1352
20
 
21
  ## Model description
22
 
 
36
 
37
  The following hyperparameters were used during training:
38
  - learning_rate: 0.0008
39
+ - train_batch_size: 2
40
+ - eval_batch_size: 2
41
  - seed: 42
42
  - distributed_type: multi-GPU
43
  - num_devices: 8
 
44
  - total_train_batch_size: 16
45
+ - total_eval_batch_size: 16
46
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
47
  - lr_scheduler_type: linear
48
  - num_epochs: 12
 
52
 
53
  | Training Loss | Epoch | Step | Validation Loss |
54
  |:-------------:|:-------:|:------:|:---------------:|
55
+ | 0.306 | 0.2000 | 3114 | 0.3017 |
56
+ | 0.2867 | 0.4000 | 6228 | 0.2795 |
57
+ | 0.2775 | 0.6000 | 9342 | 0.2688 |
58
+ | 0.2701 | 0.8001 | 12456 | 0.2583 |
59
+ | 0.268 | 1.0001 | 15570 | 0.2530 |
60
+ | 0.2609 | 1.2001 | 18684 | 0.2470 |
61
+ | 0.2549 | 1.4001 | 21798 | 0.2425 |
62
+ | 0.2542 | 1.6001 | 24912 | 0.2384 |
63
+ | 0.2489 | 1.8001 | 28026 | 0.2377 |
64
+ | 0.2469 | 2.0001 | 31140 | 0.2334 |
65
+ | 0.2416 | 2.2001 | 34254 | 0.2419 |
66
+ | 0.2402 | 2.4002 | 37368 | 0.2269 |
67
+ | 0.2401 | 2.6002 | 40482 | 0.2255 |
68
+ | 0.2368 | 2.8002 | 43596 | 0.2398 |
69
+ | 0.2309 | 3.0002 | 46710 | 0.2226 |
70
+ | 0.2289 | 3.2002 | 49824 | 0.2207 |
71
+ | 0.226 | 3.4002 | 52938 | 0.2194 |
72
+ | 0.2249 | 3.6002 | 56052 | 0.2178 |
73
+ | 0.2214 | 3.8002 | 59166 | 0.2173 |
74
+ | 0.2207 | 4.0003 | 62280 | 0.2128 |
75
+ | 0.2158 | 4.2003 | 65394 | 0.2104 |
76
+ | 0.2147 | 4.4003 | 68508 | 0.2071 |
77
+ | 0.2139 | 4.6003 | 71622 | 0.2083 |
78
+ | 0.2094 | 4.8003 | 74736 | 0.2077 |
79
+ | 0.2072 | 5.0003 | 77850 | 0.1972 |
80
+ | 0.2039 | 5.2003 | 80964 | 0.1964 |
81
+ | 0.2036 | 5.4003 | 84078 | 0.1948 |
82
+ | 0.2031 | 5.6004 | 87192 | 0.1950 |
83
+ | 0.1964 | 5.8004 | 90306 | 0.1934 |
84
+ | 0.1982 | 6.0004 | 93420 | 0.1839 |
85
+ | 0.1929 | 6.2004 | 96534 | 0.1882 |
86
+ | 0.1917 | 6.4004 | 99648 | 0.1845 |
87
+ | 0.1917 | 6.6004 | 102762 | 0.1811 |
88
+ | 0.1866 | 6.8004 | 105876 | 0.1800 |
89
+ | 0.1885 | 7.0004 | 108990 | 0.1778 |
90
+ | 0.182 | 7.2005 | 112104 | 0.1756 |
91
+ | 0.1798 | 7.4005 | 115218 | 0.1734 |
92
+ | 0.1806 | 7.6005 | 118332 | 0.1758 |
93
+ | 0.175 | 7.8005 | 121446 | 0.1728 |
94
+ | 0.1729 | 8.0005 | 124560 | 0.1737 |
95
+ | 0.1695 | 8.2005 | 127674 | 0.1673 |
96
+ | 0.1674 | 8.4005 | 130788 | 0.1657 |
97
+ | 0.1676 | 8.6006 | 133902 | 0.1623 |
98
+ | 0.1679 | 8.8006 | 137016 | 0.1609 |
99
+ | 0.1617 | 9.0006 | 140130 | 0.1596 |
100
+ | 0.1572 | 9.2006 | 143244 | 0.1590 |
101
+ | 0.1565 | 9.4006 | 146358 | 0.1570 |
102
+ | 0.1544 | 9.6006 | 149472 | 0.1537 |
103
+ | 0.1525 | 9.8006 | 152586 | 0.1512 |
104
+ | 0.1493 | 10.0006 | 155700 | 0.1524 |
105
+ | 0.1471 | 10.2007 | 158814 | 0.1479 |
106
+ | 0.1466 | 10.4007 | 161928 | 0.1452 |
107
+ | 0.1431 | 10.6007 | 165042 | 0.1442 |
108
+ | 0.1403 | 10.8007 | 168156 | 0.1418 |
109
+ | 0.1381 | 11.0007 | 171270 | 0.1400 |
110
+ | 0.1354 | 11.2007 | 174384 | 0.1383 |
111
+ | 0.1341 | 11.4007 | 177498 | 0.1374 |
112
+ | 0.1294 | 11.6007 | 180612 | 0.1364 |
113
+ | 0.1314 | 11.8008 | 183726 | 0.1352 |
114
 
115
 
116
  ### Framework versions
117
 
118
+ - PEFT 0.14.0
119
+ - Transformers 4.47.0
120
+ - Pytorch 2.5.1+cu124
121
+ - Datasets 4.2.0
122
+ - Tokenizers 0.21.0