Meeting Copilot

Live transcription + Claude whisper in a side panel

Built by Rogue AI · Desktop meeting assistant · Electron app, Windows-first

First working WASAPI capture: early 2026 in a separate Electron repo. Deepgram streaming STT and Claude Agent SDK whisper layer wired through Q1 2026. Used live in customer calls since.

Meeting Copilot, Live transcription + Claude whisper in a side panel

The problem

Note-taking during customer calls kills presence. Cloud meeting bots are creepy, record everything, and send your conversations to unknown third parties. Most 'AI meeting assistants' are glorified transcribers that dump a summary after the call ended, too late to be useful.

What I built

A desktop copilot that listens to a live call (system audio + microphone), transcribes in real time, and whispers context-aware coaching in a side panel: talking points, follow-up questions, objection-handlers, and a rolling summary. Runs locally; only the STT stream leaves the machine.

Architecture

Electron shell

Always-on-top transparent overlay, configurable hotkeys, system tray integration

Audio capture

Windows WASAPI loopback for system audio + default microphone, mixed to a single 16 kHz PCM stream

Streaming STT

Deepgram streaming API with speaker diarization, interim + final results

Transcript buffer

Rolling window with speaker labels, feeds the agent with a configurable look-back

Claude Agent SDK whisper

Tool-using agent that emits structured whispers (talking points, questions, summary) on a configurable cadence

Overlay UI

React, renders interim transcript, current whisper, rolling summary, post-call exportable notes

Tech stack

ElectronNode.jsTypeScriptReactDeepgram SDKClaude Agent SDKWASAPI

What broke first

▸
WASAPI loopback on Windows is documented; the device-enumeration edge cases aren't. Spent a day on one laptop where the default render device renamed itself between sessions, fix was to bind by device ID, not name.
▸
Whisper cadence kills or makes the product. Too aggressive and the side panel becomes noise during the call; too quiet and you forget it's there. Settled on 8-second silence trigger plus end-of-thought heuristic, user-configurable.
▸
STT diarization fails when host and guest share one laptop mic with no headset. Rewrote the speaker-tag UI to surface the failure ('one speaker, diarization unavailable') instead of mislabeling.

Outcome

Real-time meeting coach that runs on the operator's machine. Transcript and summary stay local; only the STT stream goes to Deepgram. Used during customer discovery calls, internal reviews, and practice sessions.

Honest limits

Windows-only today; the macOS path is real but ~2 weeks of work I haven't done. Deepgram is a cloud dependency for the audio stream, not fully local, contrary to a quick read of the product. Long meetings (>90 min) hit a transcript-buffer trim I'm still tuning to keep the agent context coherent.