Audio AI

FlowVoice

Free, offline Wispr Flow alternative for Windows

Built by Rogue AI · System-wide push-to-talk on Windows · Fully offline

First push-to-talk version: January 2026 with whisper.cpp tiny. F9 hotkey + uIOhook integration in February once global-key capture worked reliably across focus contexts. Daily driver since.

FlowVoice, Free, offline Wispr Flow alternative for Windows

The problem

Windows Voice Access is cloud-based, unreliable, and loses context. Third-party dictation tools either require a subscription or send every word you speak to a remote server. For confidential work, client notes, medical, legal, security research, neither is acceptable.

What I built

An Electron tray app that registers a global hotkey (F9 by default). Hold F9, speak, release, the transcribed text appears in the active application, wherever the cursor is. No internet required. No telemetry. No subscription.

Architecture

Tray process

Electron, minimal UI, persistent tray icon, global config

Hotkey hook

uIOhook for true system-global key capture (works even when no window is focused)

Audio capture

Node.js audio input stream, 16 kHz mono PCM, recorded while hotkey is held

Transcription

whisper.cpp with GPU acceleration (CUDA / Metal / CPU fallback), configurable model size (tiny/base/small/medium)

Text normalization

Punctuation restoration, common-phrase corrections, configurable dictionary

Output

Clipboard-paste into the active application, or simulated keystrokes for apps that block paste

Tech stack

ElectronNode.jsTypeScriptwhisper.cppuIOhookWASAPI

What broke first

▸
uIOhook native bindings on Windows are a build-chain rabbit hole, node-gyp + the right Windows SDK version + Python 3.11 (not 3.12, not 3.10). Documented the exact incantation in the README because I forgot it twice.
▸
whisper.cpp 'medium' is the sweet spot for English on a consumer GPU. 'small' is fast but drops technical terms; 'large-v3' is accurate but the latency makes the UX feel broken. Medium + Q5 quant on a 4070 is sub-second for short phrases.
▸
Clipboard paste fails silently in Slack and Zoom, they intercept Ctrl+V. Added a fallback that simulates keystrokes when paste doesn't visibly land within 200 ms.

Outcome

F9 in any application, terminal, browser, email client, document editor, speak, release, text appears. No network calls. No cloud. Used daily for dictating technical notes and long-form content. Median transcription latency under one second per spoken phrase on a consumer GPU.

Honest limits

English-tuned. German dictation works but punctuation restoration is rough, fine for notes, not for client-facing prose without a once-over. No custom vocabulary training (whisper.cpp doesn't expose that cleanly). Latency is sub-second on a recent GPU but climbs to 2-3s on CPU-only, fine for short phrases, painful for paragraphs.