5fba14bef314c141ceb64291316c7c17

This model is a fine-tuned version of google-bert/bert-large-cased-whole-word-masking on the google/boolq dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	0.7050	0	3.0130	0.5162	0.5015	0.5159	0.5165	0.5162
No log	1	294	0.6627	0.0078	3.7635	0.6198	0.3849	0.6198	0.6192	0.6198
No log	2	588	0.6640	0.0156	3.7631	0.6213	0.3832	0.6213	0.6207	0.6210
No log	3	882	0.6774	0.0312	5.1133	0.6081	0.4462	0.6078	0.6078	0.6085
0.0277	4	1176	0.6638	0.0625	5.9470	0.6186	0.3925	0.6186	0.6180	0.6183
0.0573	5	1470	0.6706	0.125	7.4296	0.6213	0.3832	0.6213	0.6207	0.6210

Safetensors

Model size

0.3B params

Tensor type

F32

Base model

Finetuned

(22)

this model