Whisper is remarkably accurate out of the box, but you can push it even further with the right techniques. Here's how to get the best possible transcriptions.
Understanding Initial Prompts
Whisper accepts an initial prompt that conditions the model. This isn't instructions—it's text that looks like the start of the transcript. Whisper will continue in that style.
Prompt Techniques
Proper Nouns and Names
Include names that appear in your audio: "Meeting with John Smith about the Acme Project." Whisper will recognize these names more accurately.
Technical Vocabulary
For technical content, use domain terms: "Discussion of Kubernetes deployments and Docker containerization." This primes Whisper for technical accuracy.
Style and Punctuation
Want specific formatting? Use it in the prompt. Proper capitalization and punctuation in your prompt influences output style.
Model Selection
- Tiny/Base: Quick drafts, clear audio only
- Small: Good balance for most uses
- Medium: Noisy audio or accents
- Large: Maximum accuracy, complex audio
Audio Quality Tips
- 16kHz sample rate is optimal
- Mono audio is fine—stereo doesn't help
- Normalize audio levels if too quiet
- Remove background music if possible
Language Settings
If you know the language, specify it. Auto-detection works but explicit language selection is faster and slightly more accurate.
Maximum Accuracy
Sotto uses optimized Whisper with all these techniques built in. $29 one-time purchase.
Get Sotto