--- language: zh license: apache-2.0 tags: - automatic-speech-recognition - ASR - chinese - speech --- ## 📂 Project Tree ``` WSChuan-ASR ├── paraformer_large_chuan/ │ ├── config.yaml │ ├── model.pt │ └── infer.py │ ├── Qwen2.5-omni3B/ | ├──added_tokens.json | ├──args.json | ├──char_template.jinja | ├──config.json | ├──generation_config.json | ├──merges.txt | ├──model-00001-of-00003.safetensors | ├──model-00002-of-00003.safetensors | ├──model-00003-of-00003.safetensors | ├──model.safetensors.index.json | ├──preprocessor_config.json | ├──special_tokens_map.json | ├──spk_dict.pt | ├──tokenizer_config.json | ├──tokenizer.json | ├──video_preprocessor_config.json | └──vocab.json │ ├── .gitattributes └── README.md ``` ## ASR Leaderboard | Model | Model Size | WSC-Eval-ASR - Easy | WSC-Eval-ASR - Hard | WSC-Eval-ASR - Total | Magicdata - Conversation | Magicdata - Daily-Use | Avg. | | --- | --- | --- | --- | --- | --- | --- | --- | | **with LLM** | | | | | | | | | Kimi-Audio | 7B | 16.65 | 28.66 | 17.66 | 24.67 | **5.77** | 18.68 | | FireRedASR-LLM | 8.3B | 12.80 | 25.27 | 14.40 | 17.68 | 6.69 | 15.37 | | Qwen2.5-omni | 3B | 16.94 | 26.01 | 18.20 | 20.40 | 6.32 | 17.69 | | Qwen2.5-omni-WSC-Finetune⭐ | 3B | 14.36 | 24.14 | 15.61 | 18.45 | 6.15 | 15.74 | | Qwen2.5-omni+internal data⭐ | 3B | 13.17 | 23.36 | 14.81 | 18.50 | 5.88 | 15.14 | | Qwen2.5-omni-WSC-Finetune + internal data⭐ | 3B | 12.93 | 23.19 | 14.25 | 17.95 | 5.89 | 14.84 | | **without LLM** | | | | | | | | | SenseVoice-small | 234M | 17.43 | 28.38 | 18.39 | 23.50 | 8.77 | 19.29 | | Whisper | 244M | 52.06 | 63.99 | 53.59 | 55.88 | 52.03 | 55.51 | | FireRedASR-AED | 1.1B | 13.29 | 23.64 | 14.62 | 17.84 | 6.69 | 15.14 | | Paraformer | 220M | 14.34 | 24.61 | 15.66 | 19.81 | 8.16 | 16.52 | | Paraformer-WSC-Finetune⭐ | 220M | 12.15 | 22.60 | 13.51 | 16.60 | 8.02 | 14.58 | | Paraformer + internal data⭐ | 220M | 11.93 | 21.82 | 13.14 | 15.61 | 6.77 | 13.85 | | Paraformer-WSC-Finetune + internal data⭐ | 220M | **11.59** | **21.59** | **12.87** | **14.59** | 6.28 | **13.38** | ## ASR Inference ### Paraformer_large_Chuan ``` export CUDA_VISIBLE_DEVICES=7 root_dir=./test_data test_sets=("WSC-Eval-ASR" "WSC-Eval-ASR-Hard" "WSC-Eval-ASR-Easy") model_dir=./model_dir out_rootdir=./results mkdir -p $out_rootdir python infer_paraformer.py \ --model $model_dir \ --wav_scp_file $root_dir/$test_data/wav.scp \ --output_dir $out_rootdir/debug \ --device "cuda" \ --output_file $out_dir/hyp.txt ``` --- ### Qwen2.5-Omni-3B_Chuan ``` python infer_qwen2.5omni.py \ --wavs_path /path/to/your/wav.scp \ --out_path /path/to/your/results.txt \ --gpu 0 \ --model /path/to/your/model ``` ## Note: Original: https://huggingface.co/ASLP-lab/WSChuan-ASR Duplicated by: https://huggingface.co/spaces/huggingface-projects/repo_duplicator