Spaces:

Luigi
/

Whisper-vs-Sensevoice-Small

Runtime error

App Files Files Community

Luigi commited on May 29

Commit

2d0779b

1 Parent(s): 2483f92

update readme

Browse files

Files changed (1) hide show

README.md +93 -24

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Whisper Vs SenseVoice Small
 emoji: ⚡
 colorFrom: gray
 colorTo: purple
@@ -11,9 +11,9 @@ license: mit
 short_description: Compare OpenAI Whisper against FunAudioLLM SenseVoice.
 ---
-# Whisper vs. FunASR SenseVoice Comparison
-This Space lets you compare **faster-whisper** models against FunAudioLLM’s **SenseVoice** models for automatic speech recognition (ASR), featuring:
 * Multiple faster-whisper and SenseVoice model choices.
 * Language selection for each ASR engine (full list of language codes).
@@ -21,35 +21,35 @@ This Space lets you compare **faster-whisper** models against FunAudioLLM’s **
 * Speaker diarization with `pyannote.audio`, displaying speaker-labeled transcripts.
 * Simplified Chinese to Traditional Chinese conversion via `opencc`.
 * Color-coded and scrollable diarized transcript panel.
 ## 🚀 How to Use
 1. Upload an audio file or record from your microphone.
 2. **Faster-Whisper ASR**:
-   * Select a model variant from the dropdown.
-   * Choose the transcription language (default: auto-detect).
-   * Pick device: GPU or CPU.
-   * Toggle diarization on/off.
-   * Click **Transcribe with Faster-Whisper**.
 3. **SenseVoice ASR**:
-   * Select a SenseVoice model.
-   * Choose the transcription language.
-   * Pick device: GPU or CPU.
-   * Toggle punctuation on/off.
-   * Toggle diarization on/off.
-   * Click **Transcribe with SenseVoice**.
 4. View both the plain transcript and the color-coded, speaker-labeled diarized transcript side by side.
 ## 📁 Files
 * **app.py**
   Main Gradio app implementing dual ASR pipelines with device control, diarization, and Chinese conversion.
 * **requirements.txt**
   Python dependencies: Gradio, PyTorch, Transformers, faster-whisper, funasr, pyannote.audio, pydub, opencc-python-reimplemented, ctranslate2, termcolor, NVIDIA cuBLAS/cuDNN wheels.
 * **Dockerfile** (optional)
   Defines a CUDA 12 + cuDNN 9 environment for GPU acceleration.
@@ -63,15 +63,13 @@ This Space lets you compare **faster-whisper** models against FunAudioLLM’s **
 ## 🛠️ Dependencies
 * **Python**: 3.10+
-* **faster-whisper** (>=1.1.1)
-* **ctranslate2** (==4.5.0)
-* **funasr** (>=0.6.4)
-* **torch** (>=2.0.0)
-* **transformers** (>=4.35.0)
 * **gradio** (>=3.39.0)
-* **pyannote.audio** (>=2.1.1)
-* **pydub** (>=0.25.1)
-* **ffmpeg-python** (>=0.2.0)
 * **opencc-python-reimplemented**
 * **termcolor**
 * **nvidia-cublas-cu12**, **nvidia-cudnn-cu12**
@@ -79,3 +77,74 @@ This Space lets you compare **faster-whisper** models against FunAudioLLM’s **
 ## License
 MIT

 ---
+title: OpenAI Whisper Vs Alibaba SenseVoice Small
 emoji: ⚡
 colorFrom: gray
 colorTo: purple
 short_description: Compare OpenAI Whisper against FunAudioLLM SenseVoice.
 ---
+# OpenAI Whisper vs. Alibaba SenseVoice Comparison
+This Space lets you compare **faster-whisper** models against Alibaba FunAudioLLM’s **SenseVoice** models for automatic speech recognition (ASR), featuring:
 * Multiple faster-whisper and SenseVoice model choices.
 * Language selection for each ASR engine (full list of language codes).
 * Speaker diarization with `pyannote.audio`, displaying speaker-labeled transcripts.
 * Simplified Chinese to Traditional Chinese conversion via `opencc`.
 * Color-coded and scrollable diarized transcript panel.
+* Semi‑streaming output: incremental transcript updates accumulate live as each segment or speaker turn completes.
+* Semi‑real‑time diarized transcription: speaker‑labeled segments appear incrementally as they finish processing.
 ## 🚀 How to Use
 1. Upload an audio file or record from your microphone.
 2. **Faster-Whisper ASR**:
+   1. Select a model variant from the dropdown.
+   2. Choose the transcription language (default: auto-detect).
+   3. Pick device: GPU or CPU.
+   4. Toggle diarization on/off.
+   5. Click **Transcribe with Faster-Whisper**.
 3. **SenseVoice ASR**:
+   1. Select a SenseVoice model.
+   2. Choose the transcription language.
+   3. Pick device: GPU or CPU.
+   4. Toggle punctuation on/off.
+   5. Toggle diarization on/off.
+   6. Click **Transcribe with SenseVoice**.
 4. View both the plain transcript and the color-coded, speaker-labeled diarized transcript side by side.
 ## 📁 Files
 * **app.py**
   Main Gradio app implementing dual ASR pipelines with device control, diarization, and Chinese conversion.
 * **requirements.txt**
   Python dependencies: Gradio, PyTorch, Transformers, faster-whisper, funasr, pyannote.audio, pydub, opencc-python-reimplemented, ctranslate2, termcolor, NVIDIA cuBLAS/cuDNN wheels.
 * **Dockerfile** (optional)
   Defines a CUDA 12 + cuDNN 9 environment for GPU acceleration.
 ## 🛠️ Dependencies
 * **Python**: 3.10+
 * **gradio** (>=3.39.0)
+* **torch** (>=2.0.0) & **torchaudio**
+* **transformers** (>=4.35.0)
+* **faster-whisper** (>=1.1.1) & **ctranslate2** (==4.5.0)
+* **funasr** (>=1.0.14)
+* **pyannote.audio** (>=2.1.1) & **huggingface-hub** (>=0.18.0)
+* **pydub** (>=0.25.1) & **ffmpeg-python** (>=0.2.0)
 * **opencc-python-reimplemented**
 * **termcolor**
 * **nvidia-cublas-cu12**, **nvidia-cudnn-cu12**
 ## License
 MIT
+---
+## 中文（臺灣）版本
+# OpenAI Whisper vs. Alibaba FunASR SenseVoice 功能說明
+本 Space 同步比較 **faster-whisper** 與 Alibaba FunAudioLLM 的 **SenseVoice** 模型，提供以下特色：
+* 多款 faster-whisper 與 SenseVoice 模型可自由選擇
+* 支援設定辨識語言（完整語言代碼列表）
+* 明確切換運算裝置 (GPU/CPU)，並以 `spaces.GPU` 裝飾器延後 GPU 資源配置
+* 整合 `pyannote.audio` 做語者分離，並在抄本中標示不同語者
+* 使用 `opencc` 自動將簡體中文轉為臺灣繁體中文
+* 彩色區隔對話式抄本，可捲動瀏覽及複製
+* 半即時分段輸出：每段語音或語者片段處理完成後，即時累積顯示抄本
+## 🚀 使用步驟
+1. 上傳音檔或透過麥克風錄製音訊。
+2. **Faster-Whisper ASR**：
+   1. 選擇模型版本。
+   2. 選定辨識語言 (預設自動偵測)。
+   3. 切換運算裝置：GPU 或 CPU。
+   4. 開啟/關閉語者分離功能。
+   5. 點擊「Transcribe with Faster-Whisper」。
+3. **SenseVoice ASR**：
+   1. 選擇 SenseVoice 模型。
+   2. 設定辨識語言。
+   3. 切換運算裝置：GPU 或 CPU。
+   4. 開啟/關閉標點符號功能。
+   5. 開啟/關閉語者分離功能。
+   6. 點擊「Transcribe with SenseVoice」。
+4. 左右並排查看純文字抄本與彩色標註的語者分離抄本。
+## 📁 檔案結構
+* **app.py**
+  Gradio 應用程式原始碼，實作雙 ASR 流程，包含運算裝置選擇、語者分離與中文轉換。
+* **requirements.txt**
+  Python 相依套件：Gradio、PyTorch、Transformers、faster-whisper、funasr、pyannote.audio、pydub、opencc-python-reimplemented、ctranslate2、termcolor、cuBLAS/cuDNN。
+* **Dockerfile**（選用）
+  定義 CUDA 12 + cuDNN 9 的 Docker 環境。
+## ⚠️ 注意事項
+* **Hugging Face 權杖**：請在 Space Secrets 設定 `HF_TOKEN` 或 `HUGGINGFACE_TOKEN`，以便下載語者分離模型。
+* **GPU 分配**：僅於選擇 GPU 時才會申請 GPU 資源。
+* **Python 版本**：建議使用 Python 3.10 以上。
+* **系統 ffmpeg**：請確保主機或容器中已安裝 ffmpeg，以支援音訊處理。
+## 🛠️ 相依套件
+* **Python**: 3.10+
+* **gradio**: >=3.39.0
+* **torch** & **torchaudio**: >=2.0.0
+* **transformers**: >=4.35.0
+* **faster-whisper**: >=1.1.1 & **ctranslate2**: ==4.5.0
+* **funasr**: >=1.0.14
+* **pyannote.audio**: >=2.1.1 & **huggingface-hub**: >=0.18.0
+* **pydub**: >=0.25.1 & **ffmpeg-python**: >=0.2.0
+* **opencc-python-reimplemented**
+* **termcolor**
+* **nvidia-cublas-cu12**, **nvidia-cudnn-cu12**
+## 授權
+MIT