OpenAI's Whisper comes in multiple sizes. Each trades accuracy for speed. Picking the right model depends on what you're using it for.
The Models
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| Tiny | 75MB | Fastest | Good | Quick notes, quiet environments |
| Tiny.en | 75MB | Fastest | Good (English) | English-only, maximum speed |
| Base | 140MB | Very fast | Better | Everyday dictation |
| Small | 460MB | Fast | Great | Most use cases |
| Medium | 1.5GB | Moderate | Very good | Noisy environments, accents |
| Large v3 | 2.9GB | Slower | Best | Recordings, meetings, accuracy-critical |
| Large v3 Turbo | 1.6GB | Fast | Near-best | Best speed/accuracy balance |
Recommendations by Use Case
Live Dictation (Push-to-Talk)
Speed matters here — you want text to appear quickly after you stop speaking. Start with Base or Small. If you have an M2 or later, Large v3 Turbo is fast enough for live dictation with near-best accuracy.
File Transcription
Speed is less critical since you're processing a file in the background. Use Large v3 for the best accuracy. VoxBee automatically chunks long files and processes them sequentially.
Meeting Transcription
Use Large v3 — meetings often have multiple speakers, background noise, and cross-talk where accuracy matters most. The extra processing time is worth it since you're transcribing after the meeting ends.
Non-English Languages
Larger models are significantly better for non-English languages. Use Large v3 or Large v3 Turbo for multilingual transcription. Avoid the ".en" models — they're English-only.
Apple Silicon Performance
Whisper runs on Apple's Neural Engine via WhisperKit. Performance scales with your chip:
- M1 — Base and Small are real-time or faster. Large v3 is 2-3x slower than real-time.
- M2/M3 — Small and Medium are real-time. Large v3 approaches real-time.
- M3 Pro/Max/Ultra — Large v3 runs at or faster than real-time.
How to Switch Models in VoxBee
VoxBee lets you download and switch between all 7 models in Settings → Speech. The app downloads models on demand — you only need disk space for the models you use. You can use a faster model for live dictation and a larger model for file transcription.
Try VoxBee free for 14 days and experiment with different models.