Adding Pipeline Tag
Browse filesAdding "Audio-to-Audio" pipeline tag
README.md
CHANGED
|
@@ -1,33 +1,34 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
}
|
|
|
|
| 33 |
```
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: audio-to-audio
|
| 3 |
+
license: other
|
| 4 |
+
license_name: nvidia-oneway-noncommercial-license
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# PyTorch Implementation of Audio-to-Audio Schrodinger Bridges
|
| 8 |
+
|
| 9 |
+
**Zhifeng Kong, Kevin J Shih, Weili Nie, Arash Vahdat, Sang-gil Lee, Joao Felipe Santos, Ante Jukic, Rafael Valle, Bryan Catanzaro**
|
| 10 |
+
|
| 11 |
+
[[paper]](https://arxiv.org/abs/2501.11311) [[GitHub]](https://github.com/NVIDIA/diffusion-audio-restoration) [[Demo]](https://research.nvidia.com/labs/adlr/A2SB/)
|
| 12 |
+
|
| 13 |
+
This repo contains the PyTorch implementation of [A2SB: Audio-to-Audio Schrodinger Bridges](https://arxiv.org/abs/2501.11311). A2SB is an audio restoration model tailored for high-res music at 44.1kHz. It is capable of both bandwidth extension (predicting high-frequency components) and inpainting (re-generating missing segments). Critically, A2SB is end-to-end without need of a vocoder to predict waveform outputs, and able to restore hour-long audio inputs. A2SB is capable of achieving state-of-the-art bandwidth extension and inpainting quality on several out-of-distribution music test sets.
|
| 14 |
+
|
| 15 |
+
- We propose A2SB, a state-of-the-art, end-to-end, vocoder-free, and multi-task diffusion Schrodinger Bridge model for 44.1kHz high-res music restoration, using an effective factorized audio representation.
|
| 16 |
+
|
| 17 |
+
- A2SB is the first long audio restoration model that could restore hour-long audio without
|
| 18 |
+
boundary artifacts
|
| 19 |
+
|
| 20 |
+
## License
|
| 21 |
+
|
| 22 |
+
The model is provided under the NVIDIA OneWay NonCommercial License.
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
## Citation
|
| 26 |
+
|
| 27 |
+
```
|
| 28 |
+
@article{kong2025a2sb,
|
| 29 |
+
title={A2SB: Audio-to-Audio Schrodinger Bridges},
|
| 30 |
+
author={Kong, Zhifeng and Shih, Kevin J and Nie, Weili and Vahdat, Arash and Lee, Sang-gil and Santos, Joao Felipe and Jukic, Ante and Valle, Rafael and Catanzaro, Bryan},
|
| 31 |
+
journal={arXiv preprint arXiv:2501.11311},
|
| 32 |
+
year={2025}
|
| 33 |
+
}
|
| 34 |
```
|