Whisper Model Sizes Compared: Which One Should You Use?

OpenAI's Whisper comes in multiple sizes. Each trades accuracy for speed. Picking the right model depends on what you're using it for.

The Models

Model	Size	Speed	Accuracy	Best For
Tiny	75MB	Fastest	Good	Quick notes, quiet environments
Tiny.en	75MB	Fastest	Good (English)	English-only, maximum speed
Base	140MB	Very fast	Better	Everyday dictation
Small	460MB	Fast	Great	Most use cases
Medium	1.5GB	Moderate	Very good	Noisy environments, accents
Large v3	2.9GB	Slower	Best	Recordings, meetings, accuracy-critical
Large v3 Turbo	1.6GB	Fast	Near-best	Best speed/accuracy balance

Recommendations by Use Case

Live Dictation (Push-to-Talk)

Speed matters here — you want text to appear quickly after you stop speaking. Start with Base or Small. If you have an M2 or later, Large v3 Turbo is fast enough for live dictation with near-best accuracy.

File Transcription

Speed is less critical since you're processing a file in the background. Use Large v3 for the best accuracy. VoxBee automatically chunks long files and processes them sequentially.

Meeting Transcription

Use Large v3 — meetings often have multiple speakers, background noise, and cross-talk where accuracy matters most. The extra processing time is worth it since you're transcribing after the meeting ends.

Non-English Languages

Larger models are significantly better for non-English languages. Use Large v3 or Large v3 Turbo for multilingual transcription. Avoid the ".en" models — they're English-only.

Apple Silicon Performance

Whisper runs on Apple's Neural Engine via WhisperKit. Performance scales with your chip:

M1 — Base and Small are real-time or faster. Large v3 is 2-3x slower than real-time.
M2/M3 — Small and Medium are real-time. Large v3 approaches real-time.
M3 Pro/Max/Ultra — Large v3 runs at or faster than real-time.

How to Switch Models in VoxBee

VoxBee lets you download and switch between all 7 models in Settings → Speech. The app downloads models on demand — you only need disk space for the models you use. You can use a faster model for live dictation and a larger model for file transcription.

Try VoxBee free for 14 days and experiment with different models.