Whisper Large vs Medium vs Small: Which Model Size Should You Use?

Apple Silicon: All models run well, Large is practical
Intel Mac: Stick to Small or Medium
RAM: Large needs 4GB+ free

Whisper comes in five sizes. Bigger isn't always better—the right choice depends on your hardware, audio quality, and accuracy needs.

Model Sizes at a Glance

Model	Size	Speed	Accuracy
Tiny	75MB	~32x	Basic
Base	142MB	~16x	Good
Small	466MB	~6x	Better
Medium	1.5GB	~2x	Great
Large	2.9GB	~1x	Best

Quick drafts, real-time-ish transcription, or when accuracy isn't critical. Good for clear audio with single speakers in quiet environments.

Best balance for most users. Handles moderate noise and accents well. Fast enough for comfortable use, accurate enough for most needs.

Challenging audio: accents, background noise, multiple speakers. Worth the extra time when accuracy matters and audio quality is imperfect.

Maximum accuracy for difficult audio. Professional transcription, legal/medical content, or when every word must be correct.

English-only variants (tiny.en, base.en, etc.) are slightly more accurate for English and slightly faster. Use these if you only transcribe English.

Sotto lets you switch models based on your needs. All sizes included. $49 once.