File size: 1,252 Bytes
8e593e9
 
 
 
 
5316ec4
8e593e9
8346b81
 
 
8e593e9
f7dbf7e
8346b81
8e593e9
8346b81
8e593e9
8346b81
 
 
 
 
 
 
 
 
 
 
91bd865
25abd27
 
9dc71f2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
library_name: transformers
tags: []
---

Model adapted from: https://github.com/facebookresearch/speech-resynthesis

This repository contains a VQ-VAE model trained to generate high-quality joint vector embeddings of the F0 and Energy features of speech,   
published in the paper https://www.isca-archive.org/interspeech_2025/portes25_interspeech.html.  
This preprocessing strategy used in this repository is __Interpolation__.  

For __Normalization + Voicedness mask__, see https://huggingface.co/MU-NLPC/F0_Energy_joint_VQVAE_embeddings  
For __Interpolation + Normalization__, see https://huggingface.co/MU-NLPC/F0_Energy_joint_VQVAE_embeddings-norm_interp  

The script for running the model is included in the __generate_embeddings.py__ file.  

To use, clone this repository, create a virtual environment based on the pyproject.toml file,  
for example by running:
```
poetry install
```
Then, in the generate_embeddings.py script, select the dataset, uncomment the
```
#trust_remote_code=True
```
 lines, and run the script:
```
poetry run python generate_embeddings.py
```

Note: While the model was trained using audio sampled at 16khz, the performance seems to be consistent for 24khz sampled audio as well. Use at your own discretion.