Omnilingual ASR — CTC 3B (MLX 8-bit)

MLX-compatible 8-bit quantization of Meta's Omnilingual ASR CTC-3B model for on-device inference on Apple Silicon (M2 Pro / M3 / M4 recommended). Trades ~1 GB of extra disk versus CTC-1B 4-bit for measurably better accuracy on low-resource languages per Meta's published FLEURS results.

Omnilingual ASR is a wav2vec 2.0-style encoder-only model with a linear CTC head, trained by Meta for speech recognition across 1,600+ languages. The CTC variant is language-agnostic at inference time.

Model

Parameters ~3 B
Format MLX safetensors (quantized linear layers + fp16 features)
Quantization 8-bit per-group min-max, group size 64
Sample rate 16 kHz (raw waveform input)
Frame rate 50 fps
Max duration 40 s
Languages 1,600+
Vocabulary 10,288 SentencePiece tokens

Full architecture details (num_layers / model_dim / ffn_dim) are in config.json.

Files

File Description
model.safetensors 8-bit quantized transformer weights + fp16 conv frontend
tokenizer.model SentencePiece tokenizer
config.json Architecture + quantization metadata

Usage

import mlx.core as mx
from safetensors import safe_open

weights = {}
with safe_open("model.safetensors", framework="mlx") as f:
    for k in f.keys():
        weights[k] = f.get_tensor(k)

Swift inference is provided by speech-swift.

Source

Links

License

Apache 2.0 (inherited from upstream).


Downloads last month
19
Safetensors
Model size
0.9B params
Tensor type
U32
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aufklarer/Omnilingual-ASR-CTC-3B-MLX-8bit

Finetuned
(3)
this model

Collection including aufklarer/Omnilingual-ASR-CTC-3B-MLX-8bit

Paper for aufklarer/Omnilingual-ASR-CTC-3B-MLX-8bit