Spaces:

espnet
/

SingingSDS

Sleeping

App Files Files Community

SingingSDS / README.md

jhansss

Relax Python version specification in README.md

ff8bce5 5 months ago

preview code

raw

history blame

3.84 kB

	---
	title: SingingSDS
	emoji: 🎶
	colorFrom: pink
	colorTo: yellow
	sdk: gradio
	sdk_version: 5.4.0
	app_file: app.py
	pinned: false
	---
	# SingingSDS: Role-Playing Singing Spoken Dialogue System

	A role-playing singing dialogue system that converts speech input into character-based singing output.

	## Installation

	### Requirements

	- Python 3.11+
	- CUDA (optional, for GPU acceleration)

	### Install Dependencies

	#### Option 1: Using Conda (Recommended)

	```bash
	conda create -n singingsds python=3.11

	conda activate singingsds
	conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
	pip install -r requirements.txt
	```

	#### Option 2: Using pip only

	```bash
	pip install -r requirements.txt
	```

	#### Option 3: Using pip with virtual environment

	```bash
	python -m venv singingsds_env

	# On Windows:
	singingsds_env\Scripts\activate
	# On macOS/Linux:
	source singingsds_env/bin/activate

	pip install -r requirements.txt
	```

	## Usage

	### Command Line Interface (CLI)

	#### Example Usage

	```bash
	python cli.py \
	--query_audio tests/audio/hello.wav \
	--config_path config/cli/yaoyin_default.yaml \
	--output_audio outputs/yaoyin_hello.wav \
	--eval_results_csv outputs/yaoyin_test.csv
	```

	#### Inference-Only Mode

	Run minimal inference without evaluation.

	```bash
	python cli.py \
	--query_audio tests/audio/hello.wav \
	--config_path config/cli/yaoyin_default_infer_only.yaml \
	--output_audio outputs/yaoyin_hello.wav
	```

	#### Parameter Description

	- `--query_audio`: Input audio file path (required)
	- `--config_path`: Configuration file path (default: config/cli/yaoyin_default.yaml)
	- `--output_audio`: Output audio file path (required)


	### Web Interface (Gradio)

	Start the web interface:

	```bash
	python app.py
	```

	Then visit the displayed address in your browser to use the graphical interface.

	## Configuration

	### Character Configuration

	The system supports multiple preset characters:

	- Yaoyin (遥音): Default timbre is `timbre2`
	- Limei (丽梅): Default timbre is `timbre1`

	### Model Configuration

	#### ASR Models
	- `openai/whisper-large-v3-turbo`
	- `openai/whisper-large-v3`
	- `openai/whisper-medium`
	- `openai/whisper-small`
	- `funasr/paraformer-zh`

	#### LLM Models
	- `gemini-2.5-flash`
	- `google/gemma-2-2b`
	- `meta-llama/Llama-3.2-3B-Instruct`
	- `meta-llama/Llama-3.1-8B-Instruct`
	- `Qwen/Qwen3-8B`
	- `Qwen/Qwen3-30B-A3B`
	- `MiniMaxAI/MiniMax-Text-01`

	#### SVS Models
	- `espnet/mixdata_svs_visinger2_spkemb_lang_pretrained_avg` (Bilingual)
	- `espnet/aceopencpop_svs_visinger2_40singer_pretrain` (Chinese)

	## Project Structure

	```
	SingingSDS/
	├── app.py, cli.py # Entry points (demo app & CLI)
	├── pipeline.py # Main orchestration pipeline
	├── interface.py # Gradio interface
	├── characters/ # Virtual character definitions
	├── modules/ # Core modules
	│ ├── asr/ # ASR models (Whisper, Paraformer)
	│ ├── llm/ # LLMs (Gemini, LLaMA, etc.)
	│ ├── svs/ # Singing voice synthesis (ESPnet)
	│ └── utils/ # G2P, text normalization, resources
	├── config/ # YAML configuration files
	├── data/ # Dataset metadata and length info
	├── data_handlers/ # Parsers for KiSing, Touhou, etc.
	├── evaluation/ # Evaluation metrics
	├── resources/ # Singer embeddings, phoneme dicts, MIDI
	├── assets/ # Character visuals
	├── tests/ # Unit tests and sample audios
	└── README.md, requirements.txt
	```

	## Contributing

	Issues and Pull Requests are welcome!

	## License