Meeting Copilot
Live transcription + Claude whisper in a side panel
Built by Rogue AI · Desktop meeting assistant · Electron app, Windows-first
First working WASAPI capture: early 2026 in a separate Electron repo. Deepgram streaming STT and Claude Agent SDK whisper layer wired through Q1 2026. Used live in customer calls since.
The problem
Note-taking during customer calls kills presence. Cloud meeting bots are creepy, record everything, and send your conversations to unknown third parties. Most 'AI meeting assistants' are glorified transcribers that dump a summary after the call ended — too late to be useful.
What I built
A desktop copilot that listens to a live call (system audio + microphone), transcribes in real time, and whispers context-aware coaching in a side panel: talking points, follow-up questions, objection-handlers, and a rolling summary. Runs locally; only the STT stream leaves the machine.
Architecture
Tech stack
What broke first
- ▸
WASAPI loopback on Windows is documented; the device-enumeration edge cases aren't. Spent a day on one laptop where the default render device renamed itself between sessions — fix was to bind by device ID, not name.
- ▸
Whisper cadence kills or makes the product. Too aggressive and the side panel becomes noise during the call; too quiet and you forget it's there. Settled on 8-second silence trigger plus end-of-thought heuristic, user-configurable.
- ▸
STT diarization fails when host and guest share one laptop mic with no headset. Rewrote the speaker-tag UI to surface the failure ('one speaker — diarization unavailable') instead of mislabeling.
Outcome
Real-time meeting coach that runs on the operator's machine. Transcript and summary stay local; only the STT stream goes to Deepgram. Used during customer discovery calls, internal reviews, and practice sessions.
Honest limits
Windows-only today; the macOS path is real but ~2 weeks of work I haven't done. Deepgram is a cloud dependency for the audio stream — not fully local, contrary to a quick read of the product. Long meetings (>90 min) hit a transcript-buffer trim I'm still tuning to keep the agent context coherent.
