PereLluis13
/

wav2vec2-xls-r-300m-ca

@@ -8,28 +8,86 @@ tags:
 - collectivat/tv3_parla
 - projecte-aina/parlament_parla
 - generated_from_trainer
 model-index:
 - name: wav2vec2-xls-r-300m-ca
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # wav2vec2-xls-r-300m-ca
-This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - CA dataset.
-It achieves the following results on the evaluation set:
 - Loss: 0.2472
 - Wer: 0.1499
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
@@ -37,6 +95,8 @@ More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -54,6 +114,8 @@ The following hyperparameters were used during training:
 ### Training results
 | Training Loss | Epoch | Step  | Validation Loss | Wer    |
 |:-------------:|:-----:|:-----:|:---------------:|:------:|
 | 6.2099        | 0.09  | 500   | 3.4125          | 1.0    |
@@ -126,3 +188,7 @@ The following hyperparameters were used during training:
 - Pytorch 1.10.1+cu102
 - Datasets 1.18.3
 - Tokenizers 0.11.0

 - collectivat/tv3_parla
 - projecte-aina/parlament_parla
 - generated_from_trainer
+- robust-speech-event
+datasets:
+- mozilla-foundation/common_voice_8_0
+- collectivat/tv3_parla
+- projecte-aina/parlament_parla
 model-index:
 - name: wav2vec2-xls-r-300m-ca
+  results:
+  - task:
+      name: Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: mozilla-foundation/common_voice_8_0 ca
+      type: mozilla-foundation/common_voice_8_0
+      args: ca
+    metrics:
+       - name: Test WER
+         type: wer
+         value: 0.13170091241317552
+       - name: Test CER
+         type: cer
+         value: 0.03356726205534543
+  - task:
+      name: Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: projecte-aina/parlament_parla ca
+      type: projecte-aina/parlament_parla
+      args: clean
+    metrics:
+       - name: Test WER
+         type: wer
+         value: 0.08048005647723261
+       - name: Test CER
+         type: cer
+         value: 0.02240912911020065
+  - task:
+      name: Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: collectivat/tv3_parla ca
+      type: collectivat/tv3_parla
+      args: ca
+    metrics:
+       - name: Test WER
+         type: wer
+         value: 0.23320629787889285
+       - name: Test CER
+         type: cer
+         value: 0.10439216202089989
+  - task:
+      name: Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: speech-recognition-community-v2/dev_data ca
+      type: speech-recognition-community-v2/dev_data
+      args: ca
+    metrics:
+       - name: Test WER
+         type: wer
+         value: 0.3199671115046487
+       - name: Test CER
+         type: cer
+         value: 0.15820020687277325
 ---
 # wav2vec2-xls-r-300m-ca
+This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - CA, the [tv3_parla](https://huggingface.co/datasets/collectivat/tv3_parla) and [parlament_parla](https://huggingface.co/datasets/projecte-aina/parlament_parla) datasets.
+It achieves the following results on the evaluation set (for the three datasets):
 - Loss: 0.2472
 - Wer: 0.1499
 ## Model description
+Please check the original [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) Model card. This is just a finetuned version of that model.
 ## Intended uses & limitations
+As any model trained on crowdsourced data, this model can show the biases and particularities of the data and model used to train this model. Moreover, since this is a speech recognition model, it may underperform for some lower-resourced dialects for the catalan language.
 ## Training and evaluation data
 ## Training procedure
+The data is preprocessed to remove characters not on the catalan alphabet. Moreover, numbers are verbalized using code provided by [@ccoreilly](https://github.com/ccoreilly), which can be found on the text/ folder or [here](https://github.com/CollectivaT-dev/catotron-cpu/blob/master/text/numbers_ca.py).
 ### Training hyperparameters
 The following hyperparameters were used during training:
 ### Training results
+Check the Tensorboard tab to check the training profile and evaluation results along training. The model was evaluated on the test splits for each of the datasets used during training.
 | Training Loss | Epoch | Step  | Validation Loss | Wer    |
 |:-------------:|:-----:|:-----:|:---------------:|:------:|
 | 6.2099        | 0.09  | 500   | 3.4125          | 1.0    |
 - Pytorch 1.10.1+cu102
 - Datasets 1.18.3
 - Tokenizers 0.11.0
+# Thanks
+Want to thank both [@ccoreilly](https://github.com/ccoreilly) and [@gullabi](https://github.com/gullabi) who have contributed with their own resources and knowledge into making this model possible.