SottoSotto
Back to blog
medicaldoctorsprivacydictationworkflows

Voice-to-Text for Doctors: Private, Local Medical Dictation on a Mac

How physicians and clinicians can dictate notes on a Mac without sending patient audio to the cloud: local AI models, custom medical vocabulary, and a realistic workflow.

K
June 12, 20267 min read

Doctors basically invented professional dictation — and then the tools moved to the cloud, dragging patient audio onto third-party servers and adding subscription pricing on top. Local AI quietly reversed that. Here's a realistic look at dictating clinical notes on a Mac with nothing leaving the device.

Why local processing matters in medicine

  • PHI exposure: a recording of you describing a patient is sensitive by definition. If it's processed on-device, there's no vendor, no data agreement, and no breach surface to reason about.
  • No per-minute fees: medical dictation services historically charge by volume. Local models cost nothing to run.
  • Works anywhere: clinic basement, home office, airplane — no connectivity required.

The obligatory caveat: nothing here is compliance advice. Local-only processing removes the third-party-processor question, but your documentation workflow, storage, and EHR remain your responsibility — loop in your compliance officer.

The setup

Sotto runs Whisper and NVIDIA Parakeet models entirely on Apple Silicon's Neural Engine. A practical configuration for clinical use:

  • Model: Parakeet v2 for English clinical dictation — highest recall, very fast. Whisper Large V3 Turbo if you dictate in other languages.
  • Custom vocabulary: load your dictionary with drug names, procedures, abbreviations, and colleague names. This is the single biggest accuracy lever for specialty terms.
  • Always-on rules: enable smart punctuation and filler-word removal so notes come out clean without manual editing.
  • Push-to-talk: hold a hotkey while speaking, release to insert — directly into your EHR's text field, a note template, or a document.

Three workflows that work

1. Between patients: press the hotkey and dictate the encounter summary straight into whatever field your EHR or notes app has focused. At ~150 spoken words per minute, a SOAP note takes a fraction of the typing time.

2. Batch at end of day: record voice memos throughout the day (phone or recorder), then drag the audio files into Sotto and transcribe them all locally. Every recording stays searchable, and you can re-transcribe any of them with a bigger model if needed.

3. Correspondence: referral letters and patient communication via dictation plus a "professional tone" rule — speak casually, paste polished text.

What about dedicated medical dictation suites?

Tools like Dragon Medical One are deeply integrated with EHRs, include medical language models out of the box, and are sold (at significant subscription cost) with enterprise agreements. If your hospital provides one, use it. The local-Mac approach shines for private practices, telehealth from a home office, researchers, and anyone whose institution doesn't provide tooling — at $49 once instead of four figures a year.

Related reading

K

About Kitze

Creator of Sotto and indie developer building tools for productivity. Passionate about local AI and privacy-first software.

Follow on Twitter