Local model serving
Open-weight models served locally through Ollama, routed via a hardened provider gateway with rate-limiting and metrics. Switch models per task. Nothing calls out to a third party.
A complete AI assistant over your own documents and systems, running entirely on hardware you control — built from a self-hosted fleet I run today. Data never leaves the building, every answer is sourced and logged, and the stack is secured the way I secure networks: not bolted on afterward.

Regulated and data-sensitive organisations — finance, insurance, healthcare, legal, professional services — want what cloud AI does, but over their own contracts, policies, and records. Sending that data to a US-hosted API is a non-starter under GDPR, NIS2, DORA, and plain client confidentiality.
The usual answers are “wait” or “buy a six-figure appliance.” There is a third option: run it yourself, as software, on infrastructure you already own.
Each layer is real, running code — extracted from a fleet of 20+ self-hosted AI applications I run today, not a slide.
Open-weight models served locally through Ollama, routed via a hardened provider gateway with rate-limiting and metrics. Switch models per task. Nothing calls out to a third party.
RAG over your own documents with a source citation on every answer — auto-chunking, dedup, and reranking built in. The model can't cite a clause it doesn't have, and can't invent one that isn't there.
A chat assistant grounded in your data ships in the first install. Multi-step agents that take real actions across your systems — observable, resumable, idempotent on retry — already run across the fleet and are packaged into the product next.
Every question and answer written to an immutable log. Role-based access, least-privilege database users per workload, secrets in environment only, capabilities dropped at the container.
Seventeen years hardening DAX-30 enterprise networks sits underneath this stack. Capability-dropped containers, per-app network isolation, prompt-injection defence, and secrets hygiene are the starting point — not a finding in next year's audit. The same discipline that secures a regulated firewall estate, applied to your private AI.
I built it as software you run yourself, rather than a hardware box — for a few deliberate reasons.
The first deployment does one thing end-to-end, properly — before anything is added on top.
Honest status: the core is real and runs today, extracted from a self-hosted fleet I operate myself. The product packaging is early — connectors (SharePoint, core systems), agents, per-task models, and multi-tenant deployment come next. It's ready for a pilot, not a press release.
It matters most where confidential data can't cross the perimeter — regulated, data-sensitive work across the EU and DACH.
If you're solving the same problem — useful AI over data that can't leave the building — get in touch.