Robust Speech Recognition via Large-Scale Weak Supervision
Paper
• 2212.04356 • Published
• 51
This is the OpenAI Whisper Small model converted to MLX format with FP16 precision, optimized for Apple Silicon inference.
| Property | Value |
|---|---|
| Base Model | openai/whisper-small |
| Parameters | ~244M |
| Format | MLX SafeTensors (FP16) |
| Model Size | 458.92 MB |
| Sample Rate | 16,000 Hz |
| Audio Layers | 12 |
| Text Layers | 12 |
| Hidden Size | 768 |
| Attention Heads | 12 |
| Vocabulary Size | 51,865 |
This model is optimized for on-device automatic speech recognition (ASR) on Apple Silicon devices (Mac, iPhone, iPad). It is designed for use with the WhisperKit or MLX frameworks.
config.json - Model configurationmodel.safetensors - Model weights in SafeTensors format (FP16)multilingual.tiktoken - Tokenizerimport mlx_whisper
result = mlx_whisper.transcribe(
"audio.mp3",
path_or_hf_repo="aitytech/Whisper-Small-MLX-FP16",
)
print(result["text"])
Quantized
Base model
openai/whisper-small