Skip to main content
Home / Portfolio / Meeting Copilot
Audio AI

Meeting Copilot

Live transcription + Claude whisper in a side panel

Built by Rogue AI · Desktop meeting assistant · Electron app, Windows-first

First working WASAPI capture: early 2026 in a separate Electron repo. Deepgram streaming STT and Claude Agent SDK whisper layer wired through Q1 2026. Used live in customer calls since.

Meeting Copilot — Live transcription + Claude whisper in a side panel

The problem

Note-taking during customer calls kills presence. Cloud meeting bots are creepy, record everything, and send your conversations to unknown third parties. Most 'AI meeting assistants' are glorified transcribers that dump a summary after the call ended — too late to be useful.

What I built

A desktop copilot that listens to a live call (system audio + microphone), transcribes in real time, and whispers context-aware coaching in a side panel: talking points, follow-up questions, objection-handlers, and a rolling summary. Runs locally; only the STT stream leaves the machine.

Architecture

Electron shell
Always-on-top transparent overlay, configurable hotkeys, system tray integration
Audio capture
Windows WASAPI loopback for system audio + default microphone, mixed to a single 16 kHz PCM stream
Streaming STT
Deepgram streaming API with speaker diarization, interim + final results
Transcript buffer
Rolling window with speaker labels, feeds the agent with a configurable look-back
Claude Agent SDK whisper
Tool-using agent that emits structured whispers (talking points, questions, summary) on a configurable cadence
Overlay UI
React, renders interim transcript, current whisper, rolling summary, post-call exportable notes

Tech stack

ElectronNode.jsTypeScriptReactDeepgram SDKClaude Agent SDKWASAPI

What broke first

  • WASAPI loopback on Windows is documented; the device-enumeration edge cases aren't. Spent a day on one laptop where the default render device renamed itself between sessions — fix was to bind by device ID, not name.

  • Whisper cadence kills or makes the product. Too aggressive and the side panel becomes noise during the call; too quiet and you forget it's there. Settled on 8-second silence trigger plus end-of-thought heuristic, user-configurable.

  • STT diarization fails when host and guest share one laptop mic with no headset. Rewrote the speaker-tag UI to surface the failure ('one speaker — diarization unavailable') instead of mislabeling.

Outcome

Real-time meeting coach that runs on the operator's machine. Transcript and summary stay local; only the STT stream goes to Deepgram. Used during customer discovery calls, internal reviews, and practice sessions.

Honest limits

Windows-only today; the macOS path is real but ~2 weeks of work I haven't done. Deepgram is a cloud dependency for the audio stream — not fully local, contrary to a quick read of the product. Long meetings (>90 min) hit a transcript-buffer trim I'm still tuning to keep the agent context coherent.

Related reading

← Back to portfolio