Shenava — Rizeh v1.0 (32M) · cache-aware streaming · native-Rust (tract)

Cache-aware streaming CTC export of Shenava-Rizeh-v1.0 that runs in the pure-Rust tract engine — no C++, no ONNX Runtime. Part of VisualEars / Shenava: offline, on-device, streaming Persian ASR for the Deaf/Hard-of-Hearing.

Quality: near-exact (12.11% golden-6669 WER). RTF ≈ 0.027 (30.0 ms/chunk on x86 CPU; chunk = 1.12 s audio).

⚠️ Requires patched tract (until upstreamed)

Stock tract rejects NeMo cache-aware streaming graphs in two inference-layer spots. Fix = a 23-line, 2-file patch (shenava_tract_streaming.patch, included) — PR open at sonos/tract#2441. Build tract with the patch, then load model.onnx normally. The graph itself is valid (identical decode to ONNX Runtime).

Streaming contract

Per-step inputs / outputs (fixed shapes, greedy CTC):

audio_signal [1,80,121] — un-normalized log-mel chunk (NeMo featurizer, normalize=NA)
length [1] i64 — true valid frames in the chunk
cache_last_channel [1,16,70,256], cache_last_time [1,16,256,8], cache_last_channel_len [1] i64 — start zeros / 0
→ logprobs [1,T',1025] + next caches

Chunking: feed 121-mel-frame chunks, shift 112 (9-frame pre-encode overlap). First chunk is 105 → pad to 121; pad the tail too; pass the true length. Thread the *_next caches back each step (cast cache_last_channel_len_next to i64). Greedy CTC: carry the previous token across chunk boundaries when collapsing repeats; blank id = 1024; map via tokens.txt; ▁→space.

Numbers are spoken-form → ITN

The model spells numbers (هشت not ۸). Apply persian_itn.py at display for spoken→Persian-digit (cardinals + هزار/میلیون/میلیارد + «و» + compounds).

Shenava-1 family (all native-Rust streaming)

Koochik 114M — flagship
Rizeh 32M — mid
Rizeh-Pizeh 6.9M — tiniest

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Reza2kn/Shenava-Rizeh-v1.0-tract-streaming

Base model

nvidia/stt_fa_fastconformer_hybrid_large

Finetuned

Reza2kn/Shenava-Koochik-v1.0

Finetuned

Reza2kn/Shenava-Rizeh-v1.0

Quantized

(4)

this model