Core ML Whisper Large V3 Turbo (FP16)

Core ML export of openai/whisper-large-v3-turbo tuned for Apple Silicon. The bundle reflects the Option A1 configuration and reaches ~0.024 real-time factor (RTF) when DecoderWithCache.mlpackage is used.

Files

  • whisper-large-v3-turbo-coreml-fp16/Encoder.mlpackage
  • whisper-large-v3-turbo-coreml-fp16/DecoderWithCache.mlpackage (primary, tensor KV cache)
  • whisper-large-v3-turbo-coreml-fp16/DecoderFull.mlpackage (fallback)
  • whisper-large-v3-turbo-coreml-fp16/DecoderStateful.mlpackage (experimental MLState build)
  • Tokenizer + mel assets (tokenizer.json, vocab.json, merges.txt, whisper_mel_filters.json, etc.)
  • whisper-large-v3-turbo-coreml-fp16.tar.gz and whisper-large-v3-turbo-coreml-fp16.sha256

Download & Verify

hf download DRTR-J/whisper-large-v3-turbo-coreml-fp16 whisper-large-v3-turbo-coreml-fp16.tar.gz
hf download DRTR-J/whisper-large-v3-turbo-coreml-fp16 whisper-large-v3-turbo-coreml-fp16.sha256
shasum -a 256 -c whisper-large-v3-turbo-coreml-fp16.sha256
tar -xzf whisper-large-v3-turbo-coreml-fp16.tar.gz

Usage

  1. Load Encoder.mlpackage and DecoderWithCache.mlpackage with Core ML (MLModel(contentsOf:), MLModelConfiguration(computeUnits: .cpuAndNeuralEngine)).
  2. Feed the tokenizer assets into Hugging Face AutoProcessor/tokenizers to maintain parity with the PyTorch reference.
  3. Optionally fall back to:
    • DecoderFull.mlpackage for non-cached decoding.
    • DecoderStateful.mlpackage for experimentation (ANE planner still returns error -14 on macOS 15 betas).
Downloads last month
49
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support