You don't need to upload a confidential meeting recording to some startup's server to get a transcript. Apple Silicon Macs are fast enough to run state-of-the-art speech models locally. Here are three ways to do it, from nerdiest to easiest.
Why offline matters
- Privacy: interviews, medical notes, legal calls, and voice memos never leave your machine.
- Cost: cloud APIs charge per minute, forever. Local models are free to run.
- Reliability: works on a plane, works when the API is down, works in 2030.
Option 1: whisper.cpp (free, command line)
If you're comfortable in a terminal, whisper.cpp is the classic route: install it with Homebrew, download a model, and run it against your file. It's free and scriptable, but you handle audio conversion, model management, and output formatting yourself — and there's no UI for fixing or searching transcripts afterwards.
Option 2: free GUI apps
Aiko (free) and MacWhisper's free tier both transcribe files on-device with a proper interface. Great for occasional use. The limits show up at higher volume: smaller model selections, fewer cleanup tools, and no connection to a dictation workflow.
Option 3: drag-and-drop in Sotto
Sotto is dictation-first, but it also imports audio files: drag an .mp3, .m4a, .wav, or .webm into the app (or press Cmd+Shift+I) and it transcribes using whichever model you've selected — all locally.
A few things make it pleasant for file work:
- Re-transcribe anytime: ran a voice memo through the Tiny model and it botched the names? One click re-runs it through Large V3 Turbo or Parakeet.
- Custom vocabulary: add your product names, jargon, and acronyms so the model gets them right.
- History + search: every transcript is saved locally and full-text searchable.
- Cleanup rules: auto-remove filler words and fix punctuation on the way out.
Which local model should you use?
| Model | Size | Best for |
|---|---|---|
| Whisper Tiny / Base | 66–105 MB | Quick notes, drafts |
| Whisper Large V3 Turbo | ~954 MB | Best all-round quality |
| Parakeet v2 (English) | 2.6 GB | Highest English accuracy, very fast |
| Parakeet v3 (Multilingual) | 2.7 GB | Best multilingual accuracy |
Curious how the two model families differ? We compared Whisper vs Parakeet in depth.
Step by step (the easy way)
- Install Sotto and download a model (Large V3 Turbo is the sweet spot).
- Drag your audio file onto the app, or press Cmd+Shift+I.
- Wait — the Neural Engine chews through audio much faster than realtime.
- Copy the transcript, or search it later from History.
That's it. No account, no upload, no per-minute billing. Sotto is $49 once and the transcription stays on your Mac.