Spaces:
Sleeping
Apply for community grant: Academic project (gpu)
Hi Hugging Face Team:
I am writing to apply for the GPU Community Grant on behalf of our new work PicoAudio2 (https://arxiv.org/abs/2509.00683), which focuses on developing controllable audio generation models.
PicoAudio2 is a meaningful step toward advancing temporally-controllable text-to-audio generation. PicoAudio2 introduces a new data processing pipeline and model architecture that help address some longstanding challenges in fine-grained audio control. By using a grounding model to annotate event timestamps in real audio-text datasets, we are able to curate temporally-strong real data, supplementing existing simulation data. The model benefits from training on both types of data, and we follow the design of PicoAudio by encoding timestamp information into a timestamp matrix, granting the model more detailed time-aligned cues in addition to textual descriptions. We hope these developments can make temporally-controllable audio generation more accessible and practical for researchers and industry, and we are eager to learn from future feedback and continue improving our approach.
I have shared my work on Hugging Face, where it is accessible to the broader research community. However, the performance of the current implementation on CPU-based systems is significantly hindered by slow processing times. This limitation not only affects the user experience but also restricts the full potential of the project's impact. To address this, the integration of GPU technology is essential, as it would drastically improve processing speeds and enable real-time results.
I am eager to potentially partner with your organization and deeply appreciate your consideration of my application.
Thank you for your attention.