Core ML Whisper Large V3 Turbo (FP16)

Core ML export of openai/whisper-large-v3-turbo tuned for Apple Silicon. The bundle reflects the Option A1 configuration and reaches ~0.024 real-time factor (RTF) when DecoderWithCache.mlpackage is used.

Files

whisper-large-v3-turbo-coreml-fp16/Encoder.mlpackage
whisper-large-v3-turbo-coreml-fp16/DecoderWithCache.mlpackage (primary, tensor KV cache)
whisper-large-v3-turbo-coreml-fp16/DecoderFull.mlpackage (fallback)
whisper-large-v3-turbo-coreml-fp16/DecoderStateful.mlpackage (experimental MLState build)
Tokenizer + mel assets (tokenizer.json, vocab.json, merges.txt, whisper_mel_filters.json, etc.)
whisper-large-v3-turbo-coreml-fp16.tar.gz and whisper-large-v3-turbo-coreml-fp16.sha256

Download & Verify

hf download DRTR-J/whisper-large-v3-turbo-coreml-fp16 whisper-large-v3-turbo-coreml-fp16.tar.gz
hf download DRTR-J/whisper-large-v3-turbo-coreml-fp16 whisper-large-v3-turbo-coreml-fp16.sha256
shasum -a 256 -c whisper-large-v3-turbo-coreml-fp16.sha256
tar -xzf whisper-large-v3-turbo-coreml-fp16.tar.gz

Usage

Load Encoder.mlpackage and DecoderWithCache.mlpackage with Core ML (MLModel(contentsOf:), MLModelConfiguration(computeUnits: .cpuAndNeuralEngine)).
Feed the tokenizer assets into Hugging Face AutoProcessor/tokenizers to maintain parity with the PyTorch reference.
Optionally fall back to:
- DecoderFull.mlpackage for non-cached decoding.
- DecoderStateful.mlpackage for experimentation (ANE planner still returns error -14 on macOS 15 betas).

Downloads last month: 49

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support