--- license: mit language: - id - en library_name: pytorch tags: - audio-source-separation - speech-separation - convtasnet - asteroid - itera datasets: - librimix - custom-indonesian-noisy-speech metrics: - si-sdr base_model: JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k pipeline_tag: audio-to-audio --- ## Fine-tuned model: [FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT](https://huggingface.co/FransXav/ConvTasNet-IF-Itera-SepNoisy8k-FT) Model ini adalah versi *fine-tuned* dari [`JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k`](https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k). ### Description: Model ini di-*fine-tuning* oleh peneliti dari **Teknik Informatika, Institut Teknologi Sumatera (ITERA)**. Proses *fine-tuning* menggunakan skrip yang tersedia di [repositori GitHub proyek](https://github.com/fransiskus-121140010/itera-informatics-convtasnet-ft). Model dilatih pada dataset *custom* yang terdiri dari campuran audio vokal berbahasa Indonesia dengan beragam *noise*. ### Fine-tuning config: ```yaml # Konfigurasi yang digunakan selama fine-tuning data: root: "data/processed/" sample_rate: 8000 segment_seconds: 4 num_workers: 4 training: project_name: "itera-speech-separation-ft" model_name: "ConvTasNet-ITERA-FT" # Nama yang digunakan selama training epochs: 50 batch_size: 8 learning_rate: 0.0005 gradient_clip_val: 0.5 precision: "16-mixed" early_stopping_patience: 5 model: freeze_encoder_decoder: false remix: dynamic: true snr_low: 0.0 snr_high: 10.0 ``` ## Results Evaluasi pada test set internal kami menunjukkan hasil sebagai berikut: ```yaml si_sdr: baseline_score: -30.2842 fine_tuned_score: -24.9016 improvement: +5.3826 ``` ### License Notice This work, "[NAMA_USERNAME_ANDA]/itera-informatics-convtasnet-ft", is a derivative of [`JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k`](https://huggingface.co/JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k). The original work is a derivative of: > * [LibriSpeech ASR corpus](https://www.openslr.org/12) by Vassil Panayotov, used under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/); > * The WSJ0 Hipster Ambient Mixtures dataset by [Whisper.ai](https://whisper.ai/), used under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). > > The original work is licensed under [Attribution-ShareAlike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/) by Joris Cosentino. This derivative work is licensed under the **[MIT License](https://opensource.org/licenses/MIT)** by the project authors at Institut Teknologi Sumatera.