SottoSotto
Back to blog
captionssubtitlesvideocreatorsaccessibility

How to Create Captions and Subtitles with Local AI

Generate accurate video captions using local Whisper AI. Learn workflows for SRT files, timing adjustments, and multi-language subtitles.

K
November 30, 20257 min read

Video captions improve accessibility and engagement. Local AI makes professional-quality subtitles accessible to any creator.

Why Captions Matter

  • Accessibility: Deaf and hard-of-hearing viewers
  • Silent viewing: 85% of Facebook videos watched muted
  • SEO: Search engines can't watch video, but read captions
  • Comprehension: Complex content easier to follow

Local vs Cloud Captioning

Cloud services charge per minute and keep your content. Local Whisper gives you unlimited captioning with better privacy.

Basic Workflow

  1. Extract audio from your video
  2. Run through Whisper with timestamps
  3. Export as SRT or VTT format
  4. Import into video editor or platform

SRT Format Basics

1
00:00:01,000 --> 00:00:04,000
Hello and welcome to this video.

2
00:00:04,500 --> 00:00:07,000
Today we're talking about captions.

Timing Adjustments

  • Keep segments under 7 seconds
  • Don't split mid-sentence when possible
  • Align with natural speech pauses
  • Leave 0.5s gap between segments minimum

Multi-Language Subtitles

Whisper can transcribe in the original language, then translate to English. For other target languages, use the transcript as a base for translation.

Caption Your Content

Sotto generates timestamped transcripts perfect for captioning. $29 one-time.

Get Sotto
K

About Kitze

Creator of Sotto and indie developer building tools for productivity. Passionate about local AI and privacy-first software.

Follow on Twitter