Construction DocAI

Safety, permits, FIDIC contracts

Built by Rogue AI · Vertical-tuned document extraction for construction paperwork · Self-hosted · Local lab

Built solo over a focused stretch of evenings; iterated as new document types surfaced.

Construction DocAI, Safety, permits, FIDIC contracts

The problem

A construction project generates a constant stream of paperwork that someone has to read carefully: risk assessments and method statements, building and planning permits, FIDIC/JCT/NEC contracts with their variation and retention clauses, and a pile of subcontractor insurance certificates and trade licences with expiry dates buried in the fine print. The reading is slow, easy to skim, and the things that bite later, a lapsed insurance cert, a liquidated-damages clause, a permit that expired, are exactly the details a tired reviewer misses. I wanted a tool that does the boring first read and surfaces the dates, clauses and risks so a person can check the parts that matter.

What I built

A focused web app where you upload a construction document and pick what kind of document it is, then a self-hosted vision-language model extracts the compliance-relevant data and flags risks, with structured results you can export. Rather than one catch-all analyser, it has four modes, safety documents, building permits, construction contracts, and subcontractor compliance, each driven by its own prompt tuned to that document type. There is deliberately no database: each upload is an in-memory session with a 60-minute TTL, so documents aren't retained.

Architecture

Four vertical analysis modes, not one generic reader

Separate routes for safety documents (RAMS, risk assessments, method statements, safety certs), building permits (validity, zoning compliance), construction contracts (FIDIC/JCT/NEC variations, retention, liquidated damages), and subcontractor compliance (insurance certs, trade licences, expiry tracking). Each mode carries its own extraction prompt, the single biggest lever on output quality.

Self-hosted vision-language model via Ollama

All inference runs against a local Ollama instance over an internal Docker network, no commercial AI API, no documents leaving the host. A vision-capable model reads the page as laid out, which matters for stamped permits and tabular certificate data that plain text extraction mangles.

Stateless, in-memory sessions, no database

There is no persistence layer at all. Each analysis is a session held in memory with a 60-minute TTL, then it's gone. No schema, no migrations, nothing retained on disk, a deliberate fit for a throwaway-analysis tool and one less thing to secure.

Hardened, locked-down container

Runs as a non-root, read-only container with all Linux capabilities dropped and no-new-privileges set, writable scratch space confined to small tmpfs mounts, and explicit memory, CPU and PID limits. Every port is bound to loopback only; the app talks to the model over a dedicated internal network.

Pluggable inference backend

The same app can route its LLM calls through a shared host bridge instead of local Ollama, switched by a single environment variable. The default stays on the self-hosted model to keep the offline, no-external-API behaviour; the bridge is an opt-in for heavier reasoning when it's wanted.

Tech stack

Next.jsOllamallavaDocker

What broke first

▸
A single generic 'read this document' prompt is mediocre at everything. The accuracy jump came from splitting the work into mode-specific prompts, one tuned for FIDIC/JCT/NEC contract clauses, one for safety RAMS and method statements, one for permit validity and zoning. The model didn't get smarter; the instructions got narrower.
▸
Construction paperwork is visually messy: stamped permits, scanned method statements, tables of insurance expiry dates. A vision-capable model that looks at the page layout beats plain text extraction on these, but the win is uneven, clean digital PDFs extract cleanly, faxed-and-rescanned certificates do not.
▸
Holding sessions in memory with a short TTL instead of a database removed a whole class of work, no schema, no migrations, no retained documents sitting on disk. For a tool where each upload is a throwaway analysis, that was the right trade, not a shortcut.

Outcome

A working demonstrator that turns a slow manual read of construction paperwork into a fast structured first pass, pulling out clauses, dates and risks per document type so a human can focus on review rather than transcription. It proves the pattern I care about: vertical-tuned prompts on a self-hosted model, fully local, with no documents persisted or sent to a third party. The honest limit is that it's a portfolio-grade local demo on a small model, accuracy depends on document quality, and it's a reviewer's assistant, not an authority.

Honest limits

This is a self-hosted portfolio demo, built solo and running in a local lab, the earlier public VPS was retired. It uses a self-hosted vision-language model rather than a commercial API, so quality tracks what a 7B-class local model can do, not a frontier model. Extraction accuracy varies a lot by document layout: a clean digital permit reads well; a stamped, scanned, low-contrast certificate can drop or misread fields. There is no database, sessions live in memory with a 60-minute TTL, so nothing is persisted between visits. Treat its output as a fast first pass for a human reviewer, never as a compliance sign-off.