---
library_name: transformers
license: apache-2.0
base_model: google-bert/bert-base-multilingual-cased
tags:
- generated_from_trainer
metrics:
- accuracy
- rouge
model-index:
- name: 7f46eeb9fff60eed8b49f05e77ab5d1d
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# 7f46eeb9fff60eed8b49f05e77ab5d1d

This model is a fine-tuned version of [google-bert/bert-base-multilingual-cased](https://huggingface.co/google-bert/bert-base-multilingual-cased) on the nyu-mll/glue [mnli] dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6347
- Data Size: 1.0
- Epoch Runtime: 616.6585
- Accuracy: 0.7639
- F1 Macro: 0.7624
- Rouge1: 0.7638
- Rouge2: 0.0
- Rougel: 0.7639
- Rougelsum: 0.7642

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50

### Training results

| Training Loss | Epoch | Step   | Validation Loss | Data Size | Epoch Runtime | Accuracy | F1 Macro | Rouge1 | Rouge2 | Rougel | Rougelsum |
|:-------------:|:-----:|:------:|:---------------:|:---------:|:-------------:|:--------:|:--------:|:------:|:------:|:------:|:---------:|
| No log        | 0     | 0      | 1.1021          | 0         | 4.8366        | 0.3333   | 0.2588   | 0.3333 | 0.0    | 0.3333 | 0.3332    |
| 1.056         | 1     | 12271  | 0.9014          | 0.0078    | 9.7280        | 0.5923   | 0.5927   | 0.5924 | 0.0    | 0.5925 | 0.5928    |
| 0.8885        | 2     | 24542  | 0.8165          | 0.0156    | 15.4293       | 0.6421   | 0.6365   | 0.6422 | 0.0    | 0.6425 | 0.6423    |
| 0.8231        | 3     | 36813  | 0.7632          | 0.0312    | 24.1737       | 0.6675   | 0.6626   | 0.6675 | 0.0    | 0.6675 | 0.6675    |
| 0.7834        | 4     | 49084  | 0.7125          | 0.0625    | 43.9368       | 0.6962   | 0.6953   | 0.6964 | 0.0    | 0.6962 | 0.6965    |
| 0.6754        | 5     | 61355  | 0.6893          | 0.125     | 82.9030       | 0.7145   | 0.7108   | 0.7144 | 0.0    | 0.7144 | 0.7148    |
| 0.6879        | 6     | 73626  | 0.6884          | 0.25      | 161.7223      | 0.7211   | 0.7215   | 0.7214 | 0.0    | 0.7212 | 0.7210    |
| 0.5822        | 7     | 85897  | 0.6155          | 0.5       | 318.7635      | 0.7488   | 0.7480   | 0.7486 | 0.0    | 0.7487 | 0.7489    |
| 0.575         | 8.0   | 98168  | 0.5989          | 1.0       | 631.4860      | 0.7701   | 0.7691   | 0.7699 | 0.0    | 0.7701 | 0.7701    |
| 0.4987        | 9.0   | 110439 | 0.6014          | 1.0       | 619.4724      | 0.7711   | 0.7704   | 0.7711 | 0.0    | 0.7712 | 0.7711    |
| 0.4853        | 10.0  | 122710 | 0.5948          | 1.0       | 612.5831      | 0.7719   | 0.7704   | 0.7720 | 0.0    | 0.7719 | 0.7720    |
| 0.4152        | 11.0  | 134981 | 0.6218          | 1.0       | 614.1248      | 0.7623   | 0.7630   | 0.7623 | 0.0    | 0.7624 | 0.7623    |
| 0.4063        | 12.0  | 147252 | 0.6741          | 1.0       | 614.9903      | 0.7557   | 0.7546   | 0.7555 | 0.0    | 0.7558 | 0.7559    |
| 0.4303        | 13.0  | 159523 | 0.7124          | 1.0       | 617.9320      | 0.7563   | 0.7572   | 0.7562 | 0.0    | 0.7566 | 0.7564    |
| 0.3838        | 14.0  | 171794 | 0.6347          | 1.0       | 616.6585      | 0.7639   | 0.7624   | 0.7638 | 0.0    | 0.7639 | 0.7642    |


### Framework versions

- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.3.0
- Tokenizers 0.22.1