How to Improve Whisper Accuracy with Initial Prompts

Whisper is remarkably accurate out of the box, but you can push it even further with the right techniques. Here's how to get the best possible transcriptions.

Understanding Initial Prompts

Whisper accepts an initial prompt that conditions the model. This isn't instructions—it's text that looks like the start of the transcript. Whisper will continue in that style.

Prompt Techniques

Proper Nouns and Names

Include names that appear in your audio: "Meeting with John Smith about the Acme Project." Whisper will recognize these names more accurately.

Technical Vocabulary

For technical content, use domain terms: "Discussion of Kubernetes deployments and Docker containerization." This primes Whisper for technical accuracy.

Style and Punctuation

Want specific formatting? Use it in the prompt. Proper capitalization and punctuation in your prompt influences output style.

Model Selection

Tiny/Base: Quick drafts, clear audio only
Small: Good balance for most uses
Medium: Noisy audio or accents
Large: Maximum accuracy, complex audio

Audio Quality Tips

16kHz sample rate is optimal
Mono audio is fine—stereo doesn't help
Normalize audio levels if too quiet
Remove background music if possible

Language Settings

If you know the language, specify it. Auto-detection works but explicit language selection is faster and slightly more accurate.

Maximum Accuracy

Sotto uses optimized Whisper with all these techniques built in. $49 one-time purchase.

Get Sotto