Case Study

How We Built a Maritime Document AI System in 4 Weeks

R
Rogue AI
··12 min read

Maritime companies deal with thousands of compliance documents every year — certificates of inspection, port state control reports, ISM audits, classification society surveys, and regulatory filings that span multiple jurisdictions and languages. A single missed clause or expired certificate can ground a vessel, trigger fines, or block port entry. We built a document AI system that reads, analyzes, and cross-references these documents in minutes instead of hours. Here is exactly how we did it, what worked, and what we learned along the way.

The Problem: Drowning in Paper

Our client, a mid-sized European maritime operator managing a fleet of 30+ vessels, was spending an average of two hours per document on manual compliance reviews. Their operations team would receive inspection reports, classification surveys, and regulatory updates in PDF format — often scanned, sometimes handwritten, frequently in mixed languages (English, Greek, German, and occasionally Norwegian).

The review process was painful: an operations officer would open the PDF, manually check each finding against applicable regulations, cross-reference with previous inspection reports to track recurring deficiencies, and compile a summary for the compliance manager. This happened dozens of times per week across the fleet.

Three specific pain points drove the project:

  • Missed deadlines: Certificate renewals and corrective action deadlines were tracked in spreadsheets. Things slipped through. Twice in the previous year, a vessel was detained at port due to expired documentation.
  • Inconsistent reviews: Different officers flagged different things. There was no standardized checklist or scoring system — quality depended entirely on who was reviewing that day.
  • No historical context: When reviewing a new inspection report, officers had no quick way to see whether a deficiency was recurring or new. That context matters enormously for prioritization.

What We Built: Four Analysis Modes

We designed the system around four distinct analysis modes, each addressing a different part of the compliance workflow. The user uploads a document (or set of documents), selects the analysis mode, and receives structured output within minutes.

Mode 1: Compliance Checking

The core mode. Upload an inspection report or survey, and the system checks every finding against the applicable regulatory framework — SOLAS, MARPOL, MLC, ISM Code, and classification society rules. It flags non-compliances, categorizes them by severity (critical, major, minor, observation), and generates a prioritized action list with deadlines based on the regulation's required response timeframe.

This is where the system delivers the most value. A two-hour manual review becomes a three-minute automated analysis with higher consistency. The AI does not get tired at 4 PM on a Friday or skip a section because it looks routine.

Mode 2: Risk Assessment

This mode goes beyond individual document compliance. It aggregates findings across multiple documents for the same vessel and calculates a risk score based on deficiency frequency, severity trends, and time-to-resolution history. Vessels with recurring engine room deficiencies or a pattern of delayed corrective actions get flagged for management attention before a detention happens.

Mode 3: Operational Insights

Upload any operational document — crew reports, maintenance logs, voyage reports — and the system extracts actionable insights. It identifies patterns humans miss: correlations between weather routing decisions and equipment failures, crew fatigue indicators in shift logs, and fuel consumption anomalies tied to hull condition. Think of it as an analyst that reads every document and never forgets a detail.

Mode 4: Document Comparison

Upload two versions of any document — an old and a new inspection report, two versions of a safety management manual, or a regulation update alongside the previous version — and the system produces a structured diff. It highlights what changed, what was added, what was removed, and what the implications are. This is particularly useful when classification societies update their rules or when comparing pre-survey and post-survey conditions.

Architecture: How It Works

The system runs entirely on the client's own infrastructure. No document data ever leaves their servers. This was a hard requirement — maritime operators handle commercially sensitive cargo manifests, crew personal data, and proprietary operational information. Cloud AI APIs were not an option.

System Architecture Overview

  • Frontend: Next.js application with a clean upload interface, analysis mode selector, and results dashboard. Responsive design for use on office workstations and tablets aboard vessels (when in port with connectivity).
  • Document Parser: A multi-stage pipeline that handles PDFs (both digital and scanned), Word documents, and plain text. For scanned documents, OCR preprocessing extracts text before analysis. The parser normalizes formatting, identifies document structure (headers, tables, lists, paragraphs), and segments content into analyzable chunks.
  • LLM Backend: Ollama running locally on the server, serving a fine-tuned model optimized for maritime terminology and regulatory language. The model processes document chunks with context from the regulatory knowledge base.
  • Knowledge Base: A structured repository of maritime regulations, classification rules, and historical findings used as reference context for analysis. Updated quarterly when major regulatory changes are published.
  • Results Engine: Post-processing layer that structures the LLM output into consistent formats — compliance tables, risk scores, action items with deadlines, and comparison matrices.
  • Infrastructure: Docker containers on a dedicated server. PostgreSQL for storing analysis history and document metadata. Redis for caching frequently referenced regulatory passages. Everything behind the company's existing VPN.

The data flow is straightforward: document upload triggers the parsing pipeline, which feeds normalized text chunks to the LLM with mode-specific prompts and relevant regulatory context. The LLM produces structured JSON output, which the results engine formats for display and stores for historical analysis. Average end-to-end processing time is under three minutes for a standard 20-page inspection report.

The Hard Parts: What Made This Challenging

Challenge 1: Maritime Terminology

Maritime English is its own dialect. Terms like "scheduled drydocking," "class notation," "condition of class," and "port state control detention" carry very specific regulatory meanings that differ from their plain English interpretations. General-purpose language models frequently misinterpret these terms. A "condition of class" is not a vague description — it is a formal requirement imposed by a classification society that must be resolved within a specified timeframe.

We addressed this through prompt engineering and a maritime-specific glossary injected as context. For the most critical terms, we built explicit parsing rules that identify and tag them before the LLM processes the text. This hybrid approach — rules for high-stakes terminology, LLM for natural language understanding — gave us the best accuracy.

Challenge 2: Multi-Language Documents

Inspection reports from Greek ports arrive in Greek. German classification society reports mix German and English. Norwegian maritime authority communications use Norwegian for narrative sections and English for technical findings. We needed the system to handle all of these without the user specifying the language upfront.

The solution was a language detection step in the parsing pipeline. The system identifies the primary language of each document section and routes it through language-appropriate processing. For analysis output, everything is normalized to English (the working language of the operations team). This works well for European languages. We have not tested it with Asian-language documents yet — that is a future expansion.

Challenge 3: PDF Parsing Quality

This was the single biggest source of errors. Maritime documents come in every format imaginable. Some are well-structured digital PDFs with clean text layers. Others are scanned copies of faxed documents (yes, fax is still alive in maritime). Some have stamps, wet signatures, and handwritten annotations overlaid on printed text.

We built a three-tier parsing strategy:

  • Tier 1 — Clean digital PDFs: Direct text extraction using standard PDF libraries. Fast, accurate, handles 60% of documents.
  • Tier 2 — Scanned but legible: OCR processing with post-correction heuristics for common maritime terms. Handles 30% of documents with acceptable accuracy.
  • Tier 3 — Poor quality scans: Enhanced OCR with image preprocessing (contrast adjustment, deskewing, noise removal) followed by a confidence scoring step. If confidence falls below a threshold, the system flags the document for manual review rather than producing unreliable analysis. About 10% of documents hit this tier.

Being honest about the limits of OCR quality was a deliberate design choice. An analysis system that confidently produces wrong results is worse than one that says "I cannot reliably read this document, please review manually."

Results: What We Measured

After four weeks of development and two weeks of parallel testing (running the AI alongside manual reviews to validate accuracy), we measured the following:

MetricBefore (Manual)After (AI-Assisted)
Average review time per document~2 hours~3 minutes (AI) + 15 min (human verification)
Compliance check accuracyVaries by reviewer85%+ (verified against senior reviewer baseline)
Deficiency categorization accuracy~70% consistency between reviewers92% agreement with senior reviewer
Missed critical findings2-3 per quarter (discovered later)0 in first quarter of operation
Documents processed per day8-1240+ (limited by upload volume, not processing)

Important caveat on accuracy

85% accuracy on compliance checks means the system catches the vast majority of issues, but human review is still necessary. We designed the workflow as AI-first, human-verified — not AI-only. The system drafts the analysis; a human confirms it. This hybrid approach is faster than fully manual review and more reliable than fully automated analysis.

Timeline: Four Weeks, Start to Finish

We committed to a four-week delivery timeline. Here is how the work broke down:

  • Week 1 — Discovery and document analysis: Received sample documents from the client (50+ real inspection reports and surveys). Analyzed document types, formats, and language distributions. Defined the four analysis modes based on the operations team's actual workflow. Set up the development infrastructure.
  • Week 2 — Parsing pipeline and LLM integration: Built the document parsing pipeline with the three-tier strategy. Configured Ollama with the selected model. Developed and iterated on prompts for each analysis mode. Built the maritime terminology glossary and regulatory knowledge base.
  • Week 3 — Frontend and results engine: Built the Next.js frontend with upload, mode selection, and results display. Developed the post-processing layer that structures LLM output into consistent formats. Connected everything end-to-end. First full-system tests with real documents.
  • Week 4 — Testing, refinement, and deployment: Ran the system against 200+ documents from the client's archive. Fixed edge cases in parsing and analysis. Deployed to the client's Docker infrastructure. Trained the operations team (two half-day sessions). Documented the system for their IT team.

Four weeks is fast for a production system, but it is achievable when you constrain scope aggressively. We said no to several feature requests during development — email integration, automated certificate tracking, and fleet-wide dashboards — and scheduled them for a follow-up phase. Ship the core, prove the value, then expand.

Why Self-Hosted Mattered

Every document this system processes contains commercially sensitive information — cargo details, vessel conditions, crew data, and operational patterns. Sending this to a cloud API was not acceptable to the client, and frankly, we agree with that position for this use case.

Running Ollama locally on a dedicated server means: no data leaves the network, no per-query API costs, no vendor lock-in, and no dependency on external service availability. The tradeoff is that local models are not as capable as the largest cloud models — but for this structured, domain-specific task, the accuracy is more than sufficient. Maritime compliance checking is not creative writing. It is pattern matching against known rules, which smaller, well-prompted models handle well.

Lessons Learned

  • Document quality is the bottleneck, not AI quality. The LLM performed well when given clean text. Most errors traced back to poor OCR on degraded scans. Invest in your parsing pipeline before optimizing your prompts.
  • Domain-specific glossaries are non-negotiable. General-purpose models misinterpret industry terminology. A curated glossary injected as context is a simple, high-impact intervention.
  • Design for human-in-the-loop from the start. We never positioned this as replacing human reviewers. It is a tool that makes them faster and more consistent. That framing got immediate buy-in from the operations team, who would have resisted an "AI replacement" pitch.
  • Structured output matters more than raw accuracy. A 90% accurate analysis in a clean, sortable table is more useful than 95% accuracy in a wall of text. Invest in your output formatting.
  • Four weeks is a feature, not a limitation. Tight timelines force scope discipline. The client got a working system fast, validated the approach, and now has a roadmap for expansion. Waiting six months for a "complete" system would have meant six more months of manual reviews.

What Comes Next

The system is live and processing documents daily. The next phase includes automated certificate expiry tracking (pulling dates from analyzed documents into a calendar system), fleet-wide risk dashboards that aggregate vessel scores, and integration with the client's existing fleet management software via API.

If your business processes high volumes of specialized documents — whether in maritime, legal, construction, insurance, or any other regulated industry — the same architecture applies. A domain-specific document AI system, self-hosted for privacy, with structured output and human-in-the-loop verification. That is what we build.

Get in touch to discuss whether a document AI system makes sense for your use case. We will tell you honestly if it does — and if it does not.

Rogue AI • Production Systems •