For years, "local speech-to-text" simply meant Whisper. Then NVIDIA's Parakeet models started topping the open ASR leaderboards — and once they were converted to run on Apple's Neural Engine, Mac users suddenly had a real choice. Here's how the two families actually compare.
The 30-second version
- Whisper: the versatile veteran. ~100 languages, sizes from 66 MB to ~1 GB, robust to accents and background noise.
- Parakeet: the speed-and-accuracy specialist. State-of-the-art English recall, very fast inference, larger downloads (~2.6–2.7 GB), and a multilingual v3.
Accuracy
On clean English audio, Parakeet models consistently rank at or near the top of public ASR benchmarks like Hugging Face's Open ASR Leaderboard, with lower word-error rates than Whisper Large variants. In practice that shows up as fewer dropped words and better recall of fast, mumbly speech — exactly what dictation needs.
Whisper fights back in the messy real world: heavy accents, multiple languages in one recording, far-field audio, and niche languages where Parakeet has no coverage. Whisper was trained on a vast, diverse corpus and it shows.
Speed and size
| Model | Download | Speed on Apple Silicon | Languages |
|---|---|---|---|
| Whisper Tiny | ~66 MB | Fastest | ~100 |
| Whisper Large V3 Turbo | ~954 MB | Fast | ~100 |
| Parakeet v2 | ~2.6 GB | Very fast | English |
| Parakeet v3 | ~2.7 GB | Very fast | 25 |
Counterintuitively, the bigger Parakeet models often transcribe faster than Whisper Large — the architecture (a transducer rather than an encoder-decoder) is simply more efficient at inference. The cost is disk space and a longer first download.
Language support
This is Whisper's moat. It handles roughly 100 languages, including plenty of low-resource ones. Parakeet v3 expanded to 25 (mostly European) languages with excellent quality, but if you dictate in, say, Macedonian or Thai, Whisper is your only local option.
So which should you pick?
- English dictation, accuracy above all: Parakeet v2.
- Multilingual within Europe: Parakeet v3, falling back to Whisper for unsupported languages.
- Smallest footprint: Whisper Base or Small.
- One model for everything: Whisper Large V3 Turbo.
Or just… don't choose
Sotto ships the full Whisper family and both Parakeet models, all running locally on the Neural Engine. Every recording is saved, so you can transcribe with a fast model now and re-transcribe with a more accurate one later — or A/B them on the same audio and pick your favorite. One $49 purchase, no subscription, and the models are yours to switch between freely.