Maritime DocAI

Charter parties, crew certs, PSC reports

Built by Rogue AI · Local multimodal OCR, zero-cloud · Self-hosted · Local lab

Built as an internal portfolio project, then refined in iterations against real maritime documents.

Maritime DocAI, Charter parties, crew certs, PSC reports

The problem

Maritime operators drown in unstructured paper: charter parties, crew certificates, Port State Control reports, bills of lading. The facts that matter, laytime clauses, certificate expiry dates, deficiency codes, are buried in PDFs of wildly inconsistent quality, from clean digital documents to scanned, stamped, faxed pages. The obvious fix is a cloud document-AI API, but shipping documents carry commercially sensitive terms and personal data on crew, and feeding every page to a third-party API raises both cost and data-handling questions that operators would rather not have.

What I built

A single self-hosted web app that lets you upload a shipping PDF and get structured fields back, document type, key clauses, certificate and expiry data, PSC deficiencies, with compliance-relevant items flagged, all exportable as structured data. Every step runs on the local machine: a local multimodal model performs OCR and reads document images, and a local LLM extracts the fields. No page ever leaves the box. Uploads live in short-lived in-memory sessions rather than a database, so the tool processes and exports, then forgets.

Architecture

Next.js 16 + React 19 front end and API routes

A single TypeScript/Tailwind app serves the upload UI and hosts the API routes that drive extraction. There is no separate backend service, the same Next.js process handles upload, orchestration, and export, which keeps the deployment to one container.

Local multimodal OCR with llava on Ollama

Scanned and image-based pages are read by a llava multimodal model served from a shared Ollama instance on the local network (port 11434). Because the model sees the page as an image, it handles layout and stamps that brittle text-only OCR would miss, at the cost of being only as good as the scan in front of it.

Local LLM field extraction, zero cloud

Extraction and compliance flagging run against a local Ollama model, no external API call leaves the host. The app can optionally route LLM calls through a shared local CLI bridge instead, but the default is fully local Ollama so the tool keeps working offline and no document data is sent to a third party.

Stateless, in-memory sessions, no database

There is no database. Uploaded documents and their extracted fields live in in-memory sessions with a short TTL (around 60 minutes) and are then discarded. This is a deliberate trade: it minimises data at rest and simplifies the threat model, but it also means the app is an extraction aid, not a system of record.

Hardened single-container Docker deployment

The app ships as one Docker container bound to localhost only, run read-only with all Linux capabilities dropped, no-new-privileges set, tmpfs for scratch space, and conservative memory and PID limits. It attaches to its own isolated network plus the shared Ollama network, so the only thing it talks to is the local model server.

Tech stack

Next.jsOllamallavaDocker

What broke first

▸
Multimodal models read what they can see, not what you wish was there, a clean charter party in native PDF text extracts almost perfectly, while a faxed, stamped, hand-annotated PSC report degrades gracefully at best. Layout and scan quality matter more than model size.
▸
Keeping the whole pipeline local removes a category of problems you no longer have to argue about: no third-party data-processing agreement, no per-page API bill, no question of where a crew member's passport scan ends up. That trade is paid in GPU time and slower throughput on a single box.
▸
Field extraction is only useful if it admits uncertainty. Returning a confident value for a field the model never actually found is worse than returning nothing, the honest output is the field, the value, and a flag when the source is ambiguous.

Outcome

The result is a working demonstration that maritime document extraction, charter parties, crew certificates, PSC reports, can run entirely on self-hosted infrastructure with no cloud dependency and no document data leaving the machine. It is candidly a portfolio-scale build: solo-built, running in a local lab, and accurate mainly on clean documents while degrading on poor scans. What it proves is the architecture, not a production track record: local multimodal OCR plus local LLM extraction is a viable, data-sovereign alternative to cloud document-AI for a domain where the documents are sensitive and the layouts are messy.

Honest limits

This is a self-hosted tool built solo, running as a local-lab portfolio demo, the old public-demo VPS has been retired, so there is no large production user base behind it. Document-AI accuracy varies a lot with document layout and scan quality: native-text PDFs extract cleanly, while low-resolution scans, stamps, and handwriting are unreliable and need a human to verify. Sessions are in-memory with a short TTL and there is no database, so nothing is a system of record, it is an extraction aid, not a compliance authority.