Spaces:
Runtime error
Runtime error
update readme
Browse files
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
title: Whisper Vs SenseVoice Small
|
| 3 |
emoji: ⚡
|
| 4 |
colorFrom: gray
|
| 5 |
colorTo: purple
|
|
@@ -11,9 +11,9 @@ license: mit
|
|
| 11 |
short_description: Compare OpenAI Whisper against FunAudioLLM SenseVoice.
|
| 12 |
---
|
| 13 |
|
| 14 |
-
# Whisper vs.
|
| 15 |
|
| 16 |
-
This Space lets you compare **faster-whisper** models against FunAudioLLM’s **SenseVoice** models for automatic speech recognition (ASR), featuring:
|
| 17 |
|
| 18 |
* Multiple faster-whisper and SenseVoice model choices.
|
| 19 |
* Language selection for each ASR engine (full list of language codes).
|
|
@@ -21,35 +21,35 @@ This Space lets you compare **faster-whisper** models against FunAudioLLM’s **
|
|
| 21 |
* Speaker diarization with `pyannote.audio`, displaying speaker-labeled transcripts.
|
| 22 |
* Simplified Chinese to Traditional Chinese conversion via `opencc`.
|
| 23 |
* Color-coded and scrollable diarized transcript panel.
|
|
|
|
|
|
|
| 24 |
|
| 25 |
## 🚀 How to Use
|
| 26 |
|
| 27 |
1. Upload an audio file or record from your microphone.
|
| 28 |
2. **Faster-Whisper ASR**:
|
| 29 |
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
3. **SenseVoice ASR**:
|
| 36 |
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
4. View both the plain transcript and the color-coded, speaker-labeled diarized transcript side by side.
|
| 44 |
|
| 45 |
## 📁 Files
|
| 46 |
|
| 47 |
* **app.py**
|
| 48 |
Main Gradio app implementing dual ASR pipelines with device control, diarization, and Chinese conversion.
|
| 49 |
-
|
| 50 |
* **requirements.txt**
|
| 51 |
Python dependencies: Gradio, PyTorch, Transformers, faster-whisper, funasr, pyannote.audio, pydub, opencc-python-reimplemented, ctranslate2, termcolor, NVIDIA cuBLAS/cuDNN wheels.
|
| 52 |
-
|
| 53 |
* **Dockerfile** (optional)
|
| 54 |
Defines a CUDA 12 + cuDNN 9 environment for GPU acceleration.
|
| 55 |
|
|
@@ -63,15 +63,13 @@ This Space lets you compare **faster-whisper** models against FunAudioLLM’s **
|
|
| 63 |
## 🛠️ Dependencies
|
| 64 |
|
| 65 |
* **Python**: 3.10+
|
| 66 |
-
* **faster-whisper** (>=1.1.1)
|
| 67 |
-
* **ctranslate2** (==4.5.0)
|
| 68 |
-
* **funasr** (>=0.6.4)
|
| 69 |
-
* **torch** (>=2.0.0)
|
| 70 |
-
* **transformers** (>=4.35.0)
|
| 71 |
* **gradio** (>=3.39.0)
|
| 72 |
-
* **
|
| 73 |
-
* **
|
| 74 |
-
* **
|
|
|
|
|
|
|
|
|
|
| 75 |
* **opencc-python-reimplemented**
|
| 76 |
* **termcolor**
|
| 77 |
* **nvidia-cublas-cu12**, **nvidia-cudnn-cu12**
|
|
@@ -79,3 +77,74 @@ This Space lets you compare **faster-whisper** models against FunAudioLLM’s **
|
|
| 79 |
## License
|
| 80 |
|
| 81 |
MIT
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: OpenAI Whisper Vs Alibaba SenseVoice Small
|
| 3 |
emoji: ⚡
|
| 4 |
colorFrom: gray
|
| 5 |
colorTo: purple
|
|
|
|
| 11 |
short_description: Compare OpenAI Whisper against FunAudioLLM SenseVoice.
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# OpenAI Whisper vs. Alibaba SenseVoice Comparison
|
| 15 |
|
| 16 |
+
This Space lets you compare **faster-whisper** models against Alibaba FunAudioLLM’s **SenseVoice** models for automatic speech recognition (ASR), featuring:
|
| 17 |
|
| 18 |
* Multiple faster-whisper and SenseVoice model choices.
|
| 19 |
* Language selection for each ASR engine (full list of language codes).
|
|
|
|
| 21 |
* Speaker diarization with `pyannote.audio`, displaying speaker-labeled transcripts.
|
| 22 |
* Simplified Chinese to Traditional Chinese conversion via `opencc`.
|
| 23 |
* Color-coded and scrollable diarized transcript panel.
|
| 24 |
+
* Semi‑streaming output: incremental transcript updates accumulate live as each segment or speaker turn completes.
|
| 25 |
+
* Semi‑real‑time diarized transcription: speaker‑labeled segments appear incrementally as they finish processing.
|
| 26 |
|
| 27 |
## 🚀 How to Use
|
| 28 |
|
| 29 |
1. Upload an audio file or record from your microphone.
|
| 30 |
2. **Faster-Whisper ASR**:
|
| 31 |
|
| 32 |
+
1. Select a model variant from the dropdown.
|
| 33 |
+
2. Choose the transcription language (default: auto-detect).
|
| 34 |
+
3. Pick device: GPU or CPU.
|
| 35 |
+
4. Toggle diarization on/off.
|
| 36 |
+
5. Click **Transcribe with Faster-Whisper**.
|
| 37 |
3. **SenseVoice ASR**:
|
| 38 |
|
| 39 |
+
1. Select a SenseVoice model.
|
| 40 |
+
2. Choose the transcription language.
|
| 41 |
+
3. Pick device: GPU or CPU.
|
| 42 |
+
4. Toggle punctuation on/off.
|
| 43 |
+
5. Toggle diarization on/off.
|
| 44 |
+
6. Click **Transcribe with SenseVoice**.
|
| 45 |
4. View both the plain transcript and the color-coded, speaker-labeled diarized transcript side by side.
|
| 46 |
|
| 47 |
## 📁 Files
|
| 48 |
|
| 49 |
* **app.py**
|
| 50 |
Main Gradio app implementing dual ASR pipelines with device control, diarization, and Chinese conversion.
|
|
|
|
| 51 |
* **requirements.txt**
|
| 52 |
Python dependencies: Gradio, PyTorch, Transformers, faster-whisper, funasr, pyannote.audio, pydub, opencc-python-reimplemented, ctranslate2, termcolor, NVIDIA cuBLAS/cuDNN wheels.
|
|
|
|
| 53 |
* **Dockerfile** (optional)
|
| 54 |
Defines a CUDA 12 + cuDNN 9 environment for GPU acceleration.
|
| 55 |
|
|
|
|
| 63 |
## 🛠️ Dependencies
|
| 64 |
|
| 65 |
* **Python**: 3.10+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
* **gradio** (>=3.39.0)
|
| 67 |
+
* **torch** (>=2.0.0) & **torchaudio**
|
| 68 |
+
* **transformers** (>=4.35.0)
|
| 69 |
+
* **faster-whisper** (>=1.1.1) & **ctranslate2** (==4.5.0)
|
| 70 |
+
* **funasr** (>=1.0.14)
|
| 71 |
+
* **pyannote.audio** (>=2.1.1) & **huggingface-hub** (>=0.18.0)
|
| 72 |
+
* **pydub** (>=0.25.1) & **ffmpeg-python** (>=0.2.0)
|
| 73 |
* **opencc-python-reimplemented**
|
| 74 |
* **termcolor**
|
| 75 |
* **nvidia-cublas-cu12**, **nvidia-cudnn-cu12**
|
|
|
|
| 77 |
## License
|
| 78 |
|
| 79 |
MIT
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
## 中文(臺灣)版本
|
| 84 |
+
|
| 85 |
+
# OpenAI Whisper vs. Alibaba FunASR SenseVoice 功能說明
|
| 86 |
+
|
| 87 |
+
本 Space 同步比較 **faster-whisper** 與 Alibaba FunAudioLLM 的 **SenseVoice** 模型,提供以下特色:
|
| 88 |
+
|
| 89 |
+
* 多款 faster-whisper 與 SenseVoice 模型可自由選擇
|
| 90 |
+
* 支援設定辨識語言(完整語言代碼列表)
|
| 91 |
+
* 明確切換運算裝置 (GPU/CPU),並以 `spaces.GPU` 裝飾器延後 GPU 資源配置
|
| 92 |
+
* 整合 `pyannote.audio` 做語者分離,並在抄本中標示不同語者
|
| 93 |
+
* 使用 `opencc` 自動將簡體中文轉為臺灣繁體中文
|
| 94 |
+
* 彩色區隔對話式抄本,可捲動瀏覽及複製
|
| 95 |
+
* 半即時分段輸出:每段語音或語者片段處理完成後,即時累積顯示抄本
|
| 96 |
+
|
| 97 |
+
## 🚀 使用步驟
|
| 98 |
+
|
| 99 |
+
1. 上傳音檔或透過麥克風錄製音訊。
|
| 100 |
+
2. **Faster-Whisper ASR**:
|
| 101 |
+
|
| 102 |
+
1. 選擇模型版本。
|
| 103 |
+
2. 選定辨識語言 (預設自動偵測)。
|
| 104 |
+
3. 切換運算裝置:GPU 或 CPU。
|
| 105 |
+
4. 開啟/關閉語者分離功能。
|
| 106 |
+
5. 點擊「Transcribe with Faster-Whisper」。
|
| 107 |
+
3. **SenseVoice ASR**:
|
| 108 |
+
|
| 109 |
+
1. 選擇 SenseVoice 模型。
|
| 110 |
+
2. 設定辨識語言。
|
| 111 |
+
3. 切換運算裝置:GPU 或 CPU。
|
| 112 |
+
4. 開啟/關閉標點符號功能。
|
| 113 |
+
5. 開啟/關閉語者分離功能。
|
| 114 |
+
6. 點擊「Transcribe with SenseVoice」。
|
| 115 |
+
4. 左右並排查看純文字抄本與彩色標註的語者分離抄本。
|
| 116 |
+
|
| 117 |
+
## 📁 檔案結構
|
| 118 |
+
|
| 119 |
+
* **app.py**
|
| 120 |
+
Gradio 應用程式原始碼,實作雙 ASR 流程,包含運算裝置選擇、語者分離與中文轉換。
|
| 121 |
+
* **requirements.txt**
|
| 122 |
+
Python 相依套件:Gradio、PyTorch、Transformers、faster-whisper、funasr、pyannote.audio、pydub、opencc-python-reimplemented、ctranslate2、termcolor、cuBLAS/cuDNN。
|
| 123 |
+
* **Dockerfile**(選用)
|
| 124 |
+
定義 CUDA 12 + cuDNN 9 的 Docker 環境。
|
| 125 |
+
|
| 126 |
+
## ⚠️ 注意事項
|
| 127 |
+
|
| 128 |
+
* **Hugging Face 權杖**:請在 Space Secrets 設定 `HF_TOKEN` 或 `HUGGINGFACE_TOKEN`,以便下載語者分離模型。
|
| 129 |
+
* **GPU 分配**:僅於選擇 GPU 時才會申請 GPU 資源。
|
| 130 |
+
* **Python 版本**:建議使用 Python 3.10 以上。
|
| 131 |
+
* **系統 ffmpeg**:請確保主機或容器中已安裝 ffmpeg,以支援音訊處理。
|
| 132 |
+
|
| 133 |
+
## 🛠️ 相依套件
|
| 134 |
+
|
| 135 |
+
* **Python**: 3.10+
|
| 136 |
+
* **gradio**: >=3.39.0
|
| 137 |
+
* **torch** & **torchaudio**: >=2.0.0
|
| 138 |
+
* **transformers**: >=4.35.0
|
| 139 |
+
* **faster-whisper**: >=1.1.1 & **ctranslate2**: ==4.5.0
|
| 140 |
+
* **funasr**: >=1.0.14
|
| 141 |
+
* **pyannote.audio**: >=2.1.1 & **huggingface-hub**: >=0.18.0
|
| 142 |
+
* **pydub**: >=0.25.1 & **ffmpeg-python**: >=0.2.0
|
| 143 |
+
* **opencc-python-reimplemented**
|
| 144 |
+
* **termcolor**
|
| 145 |
+
* **nvidia-cublas-cu12**, **nvidia-cudnn-cu12**
|
| 146 |
+
|
| 147 |
+
## 授權
|
| 148 |
+
|
| 149 |
+
MIT
|
| 150 |
+
|