FransXav's picture
Update README.md
f203e20
---
license: mit
language:
- id
- en
library_name: pytorch
tags:
- audio-source-separation
- speech-separation
- convtasnet
- asteroid
- itera
datasets:
- librimix
- custom-indonesian-noisy-speech
metrics:
- si-sdr
base_model: JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k
pipeline_tag: audio-to-audio
---
## Fine-tuned model: [FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT](https://huggingface.co/FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT)
Model ini adalah versi *fine-tuned* dari [`JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k`](https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k).
### Description:
Model ini di-*fine-tuning* oleh peneliti dari **Teknik Informatika, Institut Teknologi Sumatera (ITERA)**. Proses *fine-tuning* menggunakan skrip yang tersedia di [repositori GitHub proyek](https://github.com/fransiskus-121140010/itera-informatics-convtasnet-ft). Model dilatih pada dataset *custom* yang terdiri dari campuran audio vokal berbahasa Indonesia dengan beragam *noise*.
### Fine-tuning config:
```yaml
# Konfigurasi yang digunakan selama fine-tuning
data:
root: "data/processed/"
sample_rate: 8000
segment_seconds: 4
num_workers: 4
training:
project_name: "itera-speech-separation-ft"
model_name: "ConvTasNet-ITERA-FT" # Nama yang digunakan selama training
epochs: 50
batch_size: 8
learning_rate: 0.0005
gradient_clip_val: 0.5
precision: "16-mixed"
early_stopping_patience: 5
model:
freeze_encoder_decoder: false
remix:
dynamic: true
snr_low: 0.0
snr_high: 10.0
```
## Results
Evaluasi pada test set internal kami menunjukkan hasil sebagai berikut:
```yaml
si_sdr:
baseline_score: -30.2842
fine_tuned_score: -24.9016
improvement: +5.3826
```
### License Notice
This work, "[NAMA_USERNAME_ANDA]/itera-informatics-convtasnet-ft", is a derivative of [`JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k`](https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k). The original work is a derivative of:
> * [LibriSpeech ASR corpus](https://www.openslr.org/12) by Vassil Panayotov, used under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/);
> * The WSJ0 Hipster Ambient Mixtures dataset by [Whisper.ai](https://whisper.ai/), used under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
>
> The original work is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/) by Joris Cosentino.
This derivative work is licensed under the **[MIT License](https://opensource.org/licenses/MIT)** by the project authors at Institut Teknologi Sumatera.