Construction DocAI
Safety, permits, FIDIC contracts
Built by Rogue AI · Vertical-tuned document extraction for construction paperwork · Self-hosted · Local lab
Built solo over a focused stretch of evenings; iterated as new document types surfaced.
The problem
A construction project generates a constant stream of paperwork that someone has to read carefully: risk assessments and method statements, building and planning permits, FIDIC/JCT/NEC contracts with their variation and retention clauses, and a pile of subcontractor insurance certificates and trade licences with expiry dates buried in the fine print. The reading is slow, easy to skim, and the things that bite later, a lapsed insurance cert, a liquidated-damages clause, a permit that expired, are exactly the details a tired reviewer misses. I wanted a tool that does the boring first read and surfaces the dates, clauses and risks so a person can check the parts that matter.
What I built
A focused web app where you upload a construction document and pick what kind of document it is, then a self-hosted vision-language model extracts the compliance-relevant data and flags risks, with structured results you can export. Rather than one catch-all analyser, it has four modes, safety documents, building permits, construction contracts, and subcontractor compliance, each driven by its own prompt tuned to that document type. There is deliberately no database: each upload is an in-memory session with a 60-minute TTL, so documents aren't retained.
Architecture
Tech stack
What broke first
- ▸
A single generic 'read this document' prompt is mediocre at everything. The accuracy jump came from splitting the work into mode-specific prompts, one tuned for FIDIC/JCT/NEC contract clauses, one for safety RAMS and method statements, one for permit validity and zoning. The model didn't get smarter; the instructions got narrower.
- ▸
Construction paperwork is visually messy: stamped permits, scanned method statements, tables of insurance expiry dates. A vision-capable model that looks at the page layout beats plain text extraction on these, but the win is uneven, clean digital PDFs extract cleanly, faxed-and-rescanned certificates do not.
- ▸
Holding sessions in memory with a short TTL instead of a database removed a whole class of work, no schema, no migrations, no retained documents sitting on disk. For a tool where each upload is a throwaway analysis, that was the right trade, not a shortcut.
Outcome
A working demonstrator that turns a slow manual read of construction paperwork into a fast structured first pass, pulling out clauses, dates and risks per document type so a human can focus on review rather than transcription. It proves the pattern I care about: vertical-tuned prompts on a self-hosted model, fully local, with no documents persisted or sent to a third party. The honest limit is that it's a portfolio-grade local demo on a small model, accuracy depends on document quality, and it's a reviewer's assistant, not an authority.
Honest limits
This is a self-hosted portfolio demo, built solo and running in a local lab, the earlier public VPS was retired. It uses a self-hosted vision-language model rather than a commercial API, so quality tracks what a 7B-class local model can do, not a frontier model. Extraction accuracy varies a lot by document layout: a clean digital permit reads well; a stamped, scanned, low-contrast certificate can drop or misread fields. There is no database, sessions live in memory with a 60-minute TTL, so nothing is persisted between visits. Treat its output as a fast first pass for a human reviewer, never as a compliance sign-off.
