Luigi commited on
Commit
2d0779b
·
1 Parent(s): 2483f92

update readme

Browse files
Files changed (1) hide show
  1. README.md +93 -24
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Whisper Vs SenseVoice Small
3
  emoji: ⚡
4
  colorFrom: gray
5
  colorTo: purple
@@ -11,9 +11,9 @@ license: mit
11
  short_description: Compare OpenAI Whisper against FunAudioLLM SenseVoice.
12
  ---
13
 
14
- # Whisper vs. FunASR SenseVoice Comparison
15
 
16
- This Space lets you compare **faster-whisper** models against FunAudioLLM’s **SenseVoice** models for automatic speech recognition (ASR), featuring:
17
 
18
  * Multiple faster-whisper and SenseVoice model choices.
19
  * Language selection for each ASR engine (full list of language codes).
@@ -21,35 +21,35 @@ This Space lets you compare **faster-whisper** models against FunAudioLLM’s **
21
  * Speaker diarization with `pyannote.audio`, displaying speaker-labeled transcripts.
22
  * Simplified Chinese to Traditional Chinese conversion via `opencc`.
23
  * Color-coded and scrollable diarized transcript panel.
 
 
24
 
25
  ## 🚀 How to Use
26
 
27
  1. Upload an audio file or record from your microphone.
28
  2. **Faster-Whisper ASR**:
29
 
30
- * Select a model variant from the dropdown.
31
- * Choose the transcription language (default: auto-detect).
32
- * Pick device: GPU or CPU.
33
- * Toggle diarization on/off.
34
- * Click **Transcribe with Faster-Whisper**.
35
  3. **SenseVoice ASR**:
36
 
37
- * Select a SenseVoice model.
38
- * Choose the transcription language.
39
- * Pick device: GPU or CPU.
40
- * Toggle punctuation on/off.
41
- * Toggle diarization on/off.
42
- * Click **Transcribe with SenseVoice**.
43
  4. View both the plain transcript and the color-coded, speaker-labeled diarized transcript side by side.
44
 
45
  ## 📁 Files
46
 
47
  * **app.py**
48
  Main Gradio app implementing dual ASR pipelines with device control, diarization, and Chinese conversion.
49
-
50
  * **requirements.txt**
51
  Python dependencies: Gradio, PyTorch, Transformers, faster-whisper, funasr, pyannote.audio, pydub, opencc-python-reimplemented, ctranslate2, termcolor, NVIDIA cuBLAS/cuDNN wheels.
52
-
53
  * **Dockerfile** (optional)
54
  Defines a CUDA 12 + cuDNN 9 environment for GPU acceleration.
55
 
@@ -63,15 +63,13 @@ This Space lets you compare **faster-whisper** models against FunAudioLLM’s **
63
  ## 🛠️ Dependencies
64
 
65
  * **Python**: 3.10+
66
- * **faster-whisper** (>=1.1.1)
67
- * **ctranslate2** (==4.5.0)
68
- * **funasr** (>=0.6.4)
69
- * **torch** (>=2.0.0)
70
- * **transformers** (>=4.35.0)
71
  * **gradio** (>=3.39.0)
72
- * **pyannote.audio** (>=2.1.1)
73
- * **pydub** (>=0.25.1)
74
- * **ffmpeg-python** (>=0.2.0)
 
 
 
75
  * **opencc-python-reimplemented**
76
  * **termcolor**
77
  * **nvidia-cublas-cu12**, **nvidia-cudnn-cu12**
@@ -79,3 +77,74 @@ This Space lets you compare **faster-whisper** models against FunAudioLLM’s **
79
  ## License
80
 
81
  MIT
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: OpenAI Whisper Vs Alibaba SenseVoice Small
3
  emoji: ⚡
4
  colorFrom: gray
5
  colorTo: purple
 
11
  short_description: Compare OpenAI Whisper against FunAudioLLM SenseVoice.
12
  ---
13
 
14
+ # OpenAI Whisper vs. Alibaba SenseVoice Comparison
15
 
16
+ This Space lets you compare **faster-whisper** models against Alibaba FunAudioLLM’s **SenseVoice** models for automatic speech recognition (ASR), featuring:
17
 
18
  * Multiple faster-whisper and SenseVoice model choices.
19
  * Language selection for each ASR engine (full list of language codes).
 
21
  * Speaker diarization with `pyannote.audio`, displaying speaker-labeled transcripts.
22
  * Simplified Chinese to Traditional Chinese conversion via `opencc`.
23
  * Color-coded and scrollable diarized transcript panel.
24
+ * Semi‑streaming output: incremental transcript updates accumulate live as each segment or speaker turn completes.
25
+ * Semi‑real‑time diarized transcription: speaker‑labeled segments appear incrementally as they finish processing.
26
 
27
  ## 🚀 How to Use
28
 
29
  1. Upload an audio file or record from your microphone.
30
  2. **Faster-Whisper ASR**:
31
 
32
+ 1. Select a model variant from the dropdown.
33
+ 2. Choose the transcription language (default: auto-detect).
34
+ 3. Pick device: GPU or CPU.
35
+ 4. Toggle diarization on/off.
36
+ 5. Click **Transcribe with Faster-Whisper**.
37
  3. **SenseVoice ASR**:
38
 
39
+ 1. Select a SenseVoice model.
40
+ 2. Choose the transcription language.
41
+ 3. Pick device: GPU or CPU.
42
+ 4. Toggle punctuation on/off.
43
+ 5. Toggle diarization on/off.
44
+ 6. Click **Transcribe with SenseVoice**.
45
  4. View both the plain transcript and the color-coded, speaker-labeled diarized transcript side by side.
46
 
47
  ## 📁 Files
48
 
49
  * **app.py**
50
  Main Gradio app implementing dual ASR pipelines with device control, diarization, and Chinese conversion.
 
51
  * **requirements.txt**
52
  Python dependencies: Gradio, PyTorch, Transformers, faster-whisper, funasr, pyannote.audio, pydub, opencc-python-reimplemented, ctranslate2, termcolor, NVIDIA cuBLAS/cuDNN wheels.
 
53
  * **Dockerfile** (optional)
54
  Defines a CUDA 12 + cuDNN 9 environment for GPU acceleration.
55
 
 
63
  ## 🛠️ Dependencies
64
 
65
  * **Python**: 3.10+
 
 
 
 
 
66
  * **gradio** (>=3.39.0)
67
+ * **torch** (>=2.0.0) & **torchaudio**
68
+ * **transformers** (>=4.35.0)
69
+ * **faster-whisper** (>=1.1.1) & **ctranslate2** (==4.5.0)
70
+ * **funasr** (>=1.0.14)
71
+ * **pyannote.audio** (>=2.1.1) & **huggingface-hub** (>=0.18.0)
72
+ * **pydub** (>=0.25.1) & **ffmpeg-python** (>=0.2.0)
73
  * **opencc-python-reimplemented**
74
  * **termcolor**
75
  * **nvidia-cublas-cu12**, **nvidia-cudnn-cu12**
 
77
  ## License
78
 
79
  MIT
80
+
81
+ ---
82
+
83
+ ## 中文(臺灣)版本
84
+
85
+ # OpenAI Whisper vs. Alibaba FunASR SenseVoice 功能說明
86
+
87
+ 本 Space 同步比較 **faster-whisper** 與 Alibaba FunAudioLLM 的 **SenseVoice** 模型,提供以下特色:
88
+
89
+ * 多款 faster-whisper 與 SenseVoice 模型可自由選擇
90
+ * 支援設定辨識語言(完整語言代碼列表)
91
+ * 明確切換運算裝置 (GPU/CPU),並以 `spaces.GPU` 裝飾器延後 GPU 資源配置
92
+ * 整合 `pyannote.audio` 做語者分離,並在抄本中標示不同語者
93
+ * 使用 `opencc` 自動將簡體中文轉為臺灣繁體中文
94
+ * 彩色區隔對話式抄本,可捲動瀏覽及複製
95
+ * 半即時分段輸出:每段語音或語者片段處理完成後,即時累積顯示抄本
96
+
97
+ ## 🚀 使用步驟
98
+
99
+ 1. 上傳音檔或透過麥克風錄製音訊。
100
+ 2. **Faster-Whisper ASR**:
101
+
102
+ 1. 選擇模型版本。
103
+ 2. 選定辨識語言 (預設自動偵測)。
104
+ 3. 切換運算裝置:GPU 或 CPU。
105
+ 4. 開啟/關閉語者分離功能。
106
+ 5. 點擊「Transcribe with Faster-Whisper」。
107
+ 3. **SenseVoice ASR**:
108
+
109
+ 1. 選擇 SenseVoice 模型。
110
+ 2. 設定辨識語言。
111
+ 3. 切換運算裝置:GPU 或 CPU。
112
+ 4. 開啟/關閉標點符號功能。
113
+ 5. 開啟/關閉語者分離功能。
114
+ 6. 點擊「Transcribe with SenseVoice」。
115
+ 4. 左右並排查看純文字抄本與彩色標註的語者分離抄本。
116
+
117
+ ## 📁 檔案結構
118
+
119
+ * **app.py**
120
+ Gradio 應用程式原始碼,實作雙 ASR 流程,包含運算裝置選擇、語者分離與中文轉換。
121
+ * **requirements.txt**
122
+ Python 相依套件:Gradio、PyTorch、Transformers、faster-whisper、funasr、pyannote.audio、pydub、opencc-python-reimplemented、ctranslate2、termcolor、cuBLAS/cuDNN。
123
+ * **Dockerfile**(選用)
124
+ 定義 CUDA 12 + cuDNN 9 的 Docker 環境。
125
+
126
+ ## ⚠️ 注意事項
127
+
128
+ * **Hugging Face 權杖**:請在 Space Secrets 設定 `HF_TOKEN` 或 `HUGGINGFACE_TOKEN`,以便下載語者分離模型。
129
+ * **GPU 分配**:僅於選擇 GPU 時才會申請 GPU 資源。
130
+ * **Python 版本**:建議使用 Python 3.10 以上。
131
+ * **系統 ffmpeg**:請確保主機或容器中已安裝 ffmpeg,以支援音訊處理。
132
+
133
+ ## 🛠️ 相依套件
134
+
135
+ * **Python**: 3.10+
136
+ * **gradio**: >=3.39.0
137
+ * **torch** & **torchaudio**: >=2.0.0
138
+ * **transformers**: >=4.35.0
139
+ * **faster-whisper**: >=1.1.1 & **ctranslate2**: ==4.5.0
140
+ * **funasr**: >=1.0.14
141
+ * **pyannote.audio**: >=2.1.1 & **huggingface-hub**: >=0.18.0
142
+ * **pydub**: >=0.25.1 & **ffmpeg-python**: >=0.2.0
143
+ * **opencc-python-reimplemented**
144
+ * **termcolor**
145
+ * **nvidia-cublas-cu12**, **nvidia-cudnn-cu12**
146
+
147
+ ## 授權
148
+
149
+ MIT
150
+