SottoSotto
Back to blog
whisperAIcomparisontechnicalperformance

Whisper Large vs Medium vs Small: Which Model Size Should You Use?

Compare Whisper AI model sizes for transcription. Understand speed, accuracy, and resource tradeoffs to choose the right model for your needs.

K
December 3, 20256 min read

Whisper comes in five sizes. Bigger isn't always better—the right choice depends on your hardware, audio quality, and accuracy needs.

Model Sizes at a Glance

ModelSizeSpeedAccuracy
Tiny75MB~32xBasic
Base142MB~16xGood
Small466MB~6xBetter
Medium1.5GB~2xGreat
Large2.9GB~1xBest

When to Use Each

Tiny & Base

Quick drafts, real-time-ish transcription, or when accuracy isn't critical. Good for clear audio with single speakers in quiet environments.

Small

Best balance for most users. Handles moderate noise and accents well. Fast enough for comfortable use, accurate enough for most needs.

Medium

Challenging audio: accents, background noise, multiple speakers. Worth the extra time when accuracy matters and audio quality is imperfect.

Large

Maximum accuracy for difficult audio. Professional transcription, legal/medical content, or when every word must be correct.

Hardware Considerations

  • Apple Silicon: All models run well, Large is practical
  • Intel Mac: Stick to Small or Medium
  • RAM: Large needs 4GB+ free

The .en Models

English-only variants (tiny.en, base.en, etc.) are slightly more accurate for English and slightly faster. Use these if you only transcribe English.

Choose Your Model

Sotto lets you switch models based on your needs. All sizes included. $29 once.

Get Sotto
K

About Kitze

Creator of Sotto and indie developer building tools for productivity. Passionate about local AI and privacy-first software.

Follow on Twitter