CompliBot
Security questionnaire RAG — Excel in, AI answers out
Built by Rogue AI · Security-questionnaire RAG responder (Excel/CSV in, AI-drafted answers out) · Self-hosted
Built through 2026 as part of the Rogue AI fleet.
The problem
Vendor security questionnaires — SIG Lite, CAIQ, VSA, NIST CSF and the endless bespoke spreadsheets — are a recurring tax on any company that sells software. The answers barely change between deals, yet someone re-types them by hand into a new Excel file every time, hunting through old questionnaires and policy documents for wording they already wrote months ago. The work is repetitive, error-prone, and pulls security people off real work to copy-paste.
What I built
CompliBot is a Next.js 16 / React 19 app that ingests an Excel or CSV questionnaire, parses out the questions, classifies each by security domain, and drafts an answer using a retrieval-augmented pipeline grounded in your own knowledge base. You build that knowledge base by uploading past questionnaires and policies, or through a guided AI interview that turns tacit answers into structured entries. Each draft comes back with a confidence score and its source passages; reviewers approve, edit or reject side-by-side with keyboard shortcuts, then export the completed sheet back to Excel. Around the core loop sit the things a real product needs: multi-tenant organisations with role-based access, a template library, bulk upload, an immutable audit trail, gap analysis across 19 security domains, and Stripe-backed plan tiers.
Architecture
Tech stack
What broke first
- ▸
Retrieval quality decides answer quality. Pure vector search misses exact control IDs and acronyms (SOC 2, ISO 27001, CAIQ), so a hybrid of pgvector cosine similarity plus keyword matching, fused with reciprocal rank fusion, beat either approach alone for questionnaire text.
- ▸
A confidence score is only useful if it gates a human. Surfacing a per-answer confidence and an org-level threshold — auto-approve above, flag for review below — turned the AI into a drafting assistant rather than an unaccountable autoresponder, which is the only posture a security team will accept.
- ▸
Keeping the model and the embeddings on a self-hosted Ollama instance means the knowledge base — past answers, policies, internal control language — never leaves the box. For a tool whose entire input is sensitive vendor-security material, that data-residency property is the feature, not a footnote.
Outcome
CompliBot works end-to-end as a portfolio demonstration: upload a real questionnaire spreadsheet, watch it parse and classify the questions, get grounded draft answers with sources and a confidence score, review them, and export a finished sheet — all on self-hosted infrastructure where the data never leaves the machine. It is a concrete demonstration of a security-domain RAG pipeline, confidence-gated human review, and a properly hardened multi-tenant Docker deployment, rather than a slideware concept.
Honest limits
Self-hosted and built solo as a portfolio demo. It runs as a local Docker lab — the earlier public-demo VPS was retired — so there is no large production user base behind it, and answer quality depends entirely on the quality and coverage of the knowledge base you load. The default local model (qwen2.5:7b via Ollama) drafts plausible answers but still needs human review before anything ships to a customer; the Stripe billing, plan tiers and multi-tenant scaffolding exist to make it a realistic product, not because real money flows through it.
