|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- id |
|
|
- en |
|
|
library_name: pytorch |
|
|
tags: |
|
|
- audio-source-separation |
|
|
- speech-separation |
|
|
- convtasnet |
|
|
- asteroid |
|
|
- itera |
|
|
datasets: |
|
|
- librimix |
|
|
- custom-indonesian-noisy-speech |
|
|
metrics: |
|
|
- si-sdr |
|
|
base_model: JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k |
|
|
pipeline_tag: audio-to-audio |
|
|
--- |
|
|
|
|
|
## Fine-tuned model: [FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT](https://huggingface.co/FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT) |
|
|
|
|
|
Model ini adalah versi *fine-tuned* dari [`JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k`](https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k). |
|
|
|
|
|
### Description: |
|
|
Model ini di-*fine-tuning* oleh peneliti dari **Teknik Informatika, Institut Teknologi Sumatera (ITERA)**. Proses *fine-tuning* menggunakan skrip yang tersedia di [repositori GitHub proyek](https://github.com/fransiskus-121140010/itera-informatics-convtasnet-ft). Model dilatih pada dataset *custom* yang terdiri dari campuran audio vokal berbahasa Indonesia dengan beragam *noise*. |
|
|
|
|
|
### Fine-tuning config: |
|
|
```yaml |
|
|
# Konfigurasi yang digunakan selama fine-tuning |
|
|
data: |
|
|
root: "data/processed/" |
|
|
sample_rate: 8000 |
|
|
segment_seconds: 4 |
|
|
num_workers: 4 |
|
|
|
|
|
training: |
|
|
project_name: "itera-speech-separation-ft" |
|
|
model_name: "ConvTasNet-ITERA-FT" # Nama yang digunakan selama training |
|
|
epochs: 50 |
|
|
batch_size: 8 |
|
|
learning_rate: 0.0005 |
|
|
gradient_clip_val: 0.5 |
|
|
precision: "16-mixed" |
|
|
early_stopping_patience: 5 |
|
|
|
|
|
model: |
|
|
freeze_encoder_decoder: false |
|
|
|
|
|
remix: |
|
|
dynamic: true |
|
|
snr_low: 0.0 |
|
|
snr_high: 10.0 |
|
|
``` |
|
|
|
|
|
## Results |
|
|
|
|
|
Evaluasi pada test set internal kami menunjukkan hasil sebagai berikut: |
|
|
```yaml |
|
|
si_sdr: |
|
|
baseline_score: -30.2842 |
|
|
fine_tuned_score: -24.9016 |
|
|
improvement: +5.3826 |
|
|
``` |
|
|
|
|
|
### License Notice |
|
|
|
|
|
This work, "[NAMA_USERNAME_ANDA]/itera-informatics-convtasnet-ft", is a derivative of [`JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k`](https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k). The original work is a derivative of: |
|
|
> * [LibriSpeech ASR corpus](https://www.openslr.org/12) by Vassil Panayotov, used under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/); |
|
|
> * The WSJ0 Hipster Ambient Mixtures dataset by [Whisper.ai](https://whisper.ai/), used under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). |
|
|
> |
|
|
> The original work is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/) by Joris Cosentino. |
|
|
|
|
|
This derivative work is licensed under the **[MIT License](https://opensource.org/licenses/MIT)** by the project authors at Institut Teknologi Sumatera. |