nvidia
/

canary-1b-v2

@@ -1763,7 +1763,6 @@ model-index:
 ---
 ## <span style="color:#ffb300;">🐤 Canary 1B v2: Multitask Speech Transcription and Translation Model </span>
-## <span style="color:#b37800;">Description</span>
 **``Canary-1b-v2``** is a powerful 1-billion parameter model built for high-quality speech transcription and translation across 25 European languages.
@@ -1779,6 +1778,13 @@ Bulgarian (**bg**), Croatian (**hr**), Czech (**cs**), Danish (**da**), Dutch (*
 🗣️ **Experience `Canary-1b-v2` in action** at [Hugging Face Demo](https://huggingface.co/spaces/nvidia/canary-1b-v2)
 ## <span style="color:#b37800;">Key Features</span>
 **`Canary-1b-v2`** is a scaled and enhanced version of the Canary model family, offering:
@@ -1799,9 +1805,6 @@ For a deeper glimpse to Canary family models, explore this comprehensive [NeMo t
 We will soon release a comprehensive **Canary-1b-v2 technical report** detailing the model architecture, training methodology, datasets, and evaluation results.
-`Canary-1b-v2` model is ready for commercial/non-commercial use.
----
 ### Automatic Speech Recognition (ASR)
@@ -1837,10 +1840,6 @@ We will soon release a comprehensive **Canary-1b-v2 technical report** detailing
 ---
-## <span style="color:#b37800;">License/Terms of Use</span>
-GOVERNING TERMS: Use of this model is governed by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode.en) license.
 ## <span style="color:#b37800;">Deployment Geography</span>
 Global
@@ -1851,7 +1850,7 @@ This model serves developers, researchers, academics, and industries building ap
 ## <span style="color:#b37800;">Release Date</span>
-08/14/2025
 ## <span style="color:#b37800;">Model Architecture</span>
@@ -1914,6 +1913,8 @@ print(output[0].text)
 #### Transcribing with timestamps
 To transcribe with timestamps:
 ```python
 output = asr_model.transcribe(['2086-149220-0033.wav'], source_lang='en', target_lang='en', timestamps=True)
@@ -1944,7 +1945,7 @@ For translation task, please, refer to segment-level timestamps for getting intu
 **Runtime Engine(s):**
-* NeMo 2.2
 **Supported Hardware Microarchitecture Compatibility:**
@@ -2022,7 +2023,13 @@ To read more about the pseudo-labeling technique and [pipeline](https://github.c
 All transcripts include punctuation and capitalization.
-**Labels**: Hybrid (Human-labeled, Pseudo-Labeled)
 ---
@@ -2034,7 +2041,13 @@ All transcripts include punctuation and capitalization.
 * Earnings-22 \[14], This American Life \[15] (long-form)
 * MUSAN \[16]
-**Labels**: Human-labeled
 ## <span style="color:#b37800;">Benchmark Results</span>
@@ -2219,4 +2232,4 @@ Model and dataset restrictions            |  The Principle of least privilege (P
 \[15] [Speech Recognition and Multi-Speaker Diarization of Long Conversations](https://arxiv.org/abs/2005.08072)
-\[16] [MUSAN: A Music, Speech, and Noise Corpus](https://arxiv.org/abs/1510.08484)

 ---
 ## <span style="color:#ffb300;">🐤 Canary 1B v2: Multitask Speech Transcription and Translation Model </span>
 **``Canary-1b-v2``** is a powerful 1-billion parameter model built for high-quality speech transcription and translation across 25 European languages.
 🗣️ **Experience `Canary-1b-v2` in action** at [Hugging Face Demo](https://huggingface.co/spaces/nvidia/canary-1b-v2)
+`Canary-1b-v2` model is ready for commercial/non-commercial use.
+## <span style="color:#b37800;">License/Terms of Use</span>
+GOVERNING TERMS: Use of this model is governed by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode.en) license.
 ## <span style="color:#b37800;">Key Features</span>
 **`Canary-1b-v2`** is a scaled and enhanced version of the Canary model family, offering:
 We will soon release a comprehensive **Canary-1b-v2 technical report** detailing the model architecture, training methodology, datasets, and evaluation results.
 ### Automatic Speech Recognition (ASR)
 ---
 ## <span style="color:#b37800;">Deployment Geography</span>
 Global
 ## <span style="color:#b37800;">Release Date</span>
+Huggingface [08/14/2025](https://huggingface.co/nvidia/canary-1b-v2)
 ## <span style="color:#b37800;">Model Architecture</span>
 #### Transcribing with timestamps
+> **Note:** Use [main branch of NeMo](https://github.com/NVIDIA/NeMo/) to get timestamps until it is released in NeMo 2.5.
 To transcribe with timestamps:
 ```python
 output = asr_model.transcribe(['2086-149220-0033.wav'], source_lang='en', target_lang='en', timestamps=True)
 **Runtime Engine(s):**
+* NeMo main branch (until it is released in NeMo 2.5)
 **Supported Hardware Microarchitecture Compatibility:**
 All transcripts include punctuation and capitalization.
+**Data Collection Method by dataset**
+* Hybrid: Automated, Human
+**Labeling Method by dataset**
+* Hybrid: Synthetic, Human
 ---
 * Earnings-22 \[14], This American Life \[15] (long-form)
 * MUSAN \[16]
+**Data Collection Method by dataset**
+* Human
+**Labeling Method by dataset**
+* Human
 ## <span style="color:#b37800;">Benchmark Results</span>
 \[15] [Speech Recognition and Multi-Speaker Diarization of Long Conversations](https://arxiv.org/abs/2005.08072)
+\[16] [MUSAN: A Music, Speech, and Noise Corpus](https://arxiv.org/abs/1510.08484)