Securing Self-Hosted AI: Infrastructure Hardening for Production
Most AI security writing in 2026 stops at prompt injection and content filtering. That's the application layer. If you run AI yourself — Ollama on a VPS, vLLM on a GPU server, anything that touches your own infrastructure — the larger attack surface is everything underneath the model: containers, networks, secrets, the database holding your embeddings, the GPU runtime, the build pipeline that produced your image.
This guide covers how to secure that stack. It comes from operating a 20+ application self-hosted AI fleet — Ollama for local inference, PostgreSQL with pgvector, Redis, custom Next.js front-ends, Caddy as a reverse proxy — across isolated Docker networks on Hetzner. Most of the mistakes below were mine before they were lessons.
The Threat Model You Actually Face
Before any control, agree on what you are defending against. Self-hosted AI typically faces four realistic threat classes:
Internet-exposed services
Any port open to the public web is a target within minutes. Default-bound Ollama (0.0.0.0:11434), exposed Postgres, unprotected Redis — all enumerated and probed by automated scanners continuously.
Container escape and lateral movement
A compromised app container with default Linux capabilities and privileged Docker socket access can pivot to the host or to peer containers. The blast radius is the entire host.
Supply chain compromise
Unpinned base images, unverified Hugging Face weights, npm packages with post-install scripts, Python wheels with native code — every dependency is a trust decision you may not have realised you made.
Data leakage through the model
Prompt logs that capture PII, embeddings that encode confidential text reversibly, training datasets retained on disk, model files shared across tenants. The data is in your environment but not under your access controls.
Container Hardening: The Non-Negotiable Defaults
Docker's defaults are tuned for developer convenience, not production security. Every container I run in production has the following applied in docker-compose.yml — the cost is near-zero and the blast radius reduction is enormous.
Drop all capabilities, then add only what's needed
services:
app:
cap_drop: [ALL]
security_opt: [no-new-privileges:true]
read_only: true
tmpfs:
- /tmp:size=100m
- /app/.next/cache:size=200m
user: "1001:1001"Most app containers need zero capabilities. Postgres needs CHOWN, DAC_OVERRIDE, FOWNER, SETGID, SETUID. Redis on Alpine needs CHOWN, DAC_OVERRIDE, SETGID, SETUID for user-switching at startup. Anything else means you're cargo-culting.
Run as a non-root user. The Dockerfile creates a numeric user (uid 1001) and drops to it before the entrypoint. USER root in the final stage is a flag for review. If you cannot avoid it (some Python ML images), pin the capability set explicitly and audit periodically.
Make the filesystem read-only. read_only: true plus tmpfs for the few writable paths (cache, /tmp). A compromised process can't drop a webshell or persist across restarts. The first time you turn this on, you'll find your app was writing to /app/logs or similar — fix the app, don't loosen the constraint.
Bound resources matter. A runaway LLM call can OOM the host. Set deploy.resources.limits.memory and deploy.resources.limits.cpus on every service. Set pids_limit: 200 to stop fork bombs.
Network Isolation: Each App on Its Own Network
The default Docker bridge puts every container in the same broadcast domain. This means a compromised container can talk to every other container on the host. Don't do this.
Per-app dedicated network
Each application stack gets its own /24 subnet. App, database, queue, and worker are on the same network; nothing else is. A compromise in one app cannot reach another.
Bind to 127.0.0.1, never 0.0.0.0
Published ports go to 127.0.0.1:PORT. The reverse proxy (Caddy or nginx) listens on the public interface, terminates TLS, and proxies to localhost. Postgres, Redis, Ollama are never directly internet-reachable.
Shared services on a known shared network
One Ollama instance for the fleet sits on a dedicated ailab-network. Apps that need it explicitly join that network. The ACL becomes "is this service on the AI lab network?" — simple, auditable, enforceable at iptables.
At the host level, run a firewall that defaults to deny — ufw with rules for SSH, 80, 443. Test it from outside, not from a shell on the box. I've shipped firewall rules that looked right and weren't.
Ollama-Specific Risks
Ollama is tremendously useful and has security defaults that need active management:
Default bind address
Older versions bound to 0.0.0.0. Set OLLAMA_HOST=127.0.0.1:11434 explicitly. If running in Docker, ensure the published port maps only to localhost.
No authentication
Ollama itself has no auth model. Authentication has to be enforced in the layer above — your app issues all requests, never the user directly. Anyone reaching :11434 can run any model.
Model trust
ollama pull from arbitrary registries is a supply-chain decision. GGUF files can contain malicious metadata triggering parser bugs in older runtimes. Pin to specific upstream tags from sources you trust, mirror them locally, and verify checksums.
Database Security for AI Workloads
Vector databases hold something more sensitive than ordinary user data: embeddings of all your documents. With the right model, those embeddings are partially reversible. Treat them as a copy of your source corpus.
Least-privilege users per app
Never run application queries as the Postgres superuser. Each app gets its own role with grants only on its schema. Migrations run as a separate, more privileged role gated behind manual deployment steps.
Row-level security for multi-tenant retrieval
Multi-tenant RAG is where row-level security earns its keep. The tenant id is set as a session variable; a Postgres RLS policy filters every SELECT against the embeddings table. A forgotten WHERE clause in application code can no longer leak data.
Encryption at rest is the host disk's job
Full-disk encryption on the host (LUKS) covers the database files. Postgres-level transparent encryption adds operational pain without meaningful benefit if your threat model is hosting-provider compromise — they hold the host's encryption key in memory anyway.
Secrets Management Without a Vault
Most self-hosted AI fleets run too small to justify HashiCorp Vault. The fallback is a few simple disciplines that go a long way:
.env files outside the build context
Never bake secrets into images. Pass them at runtime via env_file in compose. Add .env to .dockerignore. Audit your image with docker history to confirm nothing leaked into a layer.
Required-variable syntax
Use ${VAR:?error message} in compose, never ${VAR:-default} for credentials. The container should refuse to start with missing secrets, not run with a default.
Rotate database passwords on incident, not on schedule
Calendar rotation theatre is mostly compliance signalling. Rotation on actual incidents (a compromised laptop, a leaked dump) is what matters. Maintain a runbook so rotation can be done in minutes.
Logging and Audit
AI applications generate three log streams that need different treatment: HTTP access logs, application logs (decisions, errors), and prompt logs (inputs and outputs). Each is sensitive in a different way.
Correlation ids on every request
Generate a UUID at the edge, propagate it through every internal call. When something breaks at 3am, you grep one id across all services and have the full story.
Redact prompts that contain PII
Logging full prompts is convenient but turns your log store into a shadow PII repository. Either redact at write time, or store prompt logs in a separately controlled location with the same access-control model as the source data.
Log rotation and bounded disk
Docker's default logging fills disks. Set logging.driver: json-file, options.max-size: 10m, options.max-file: 3. Database and Redis can go higher (50m / 5).
Image Hygiene and the Supply Chain
The AI ecosystem moves fast and pulls a lot of code. The supply chain is the highest-risk surface most teams ignore:
- Pin base images to patch tags.
node:22-alpine3.21, notnode:latest. Schedule a monthly rebuild that picks up newer patches. - Multi-stage builds. Build tools (gcc, native deps, npm with devDependencies) live in a builder stage. The runner is a minimal layer with the production artefact only.
- Run a vulnerability scanner. Trivy or Grype on every built image, fail the build on CRITICAL. Triage HIGH weekly. The signal-to-noise on AI-related CVEs is poor; you will need to make context calls.
- Lockfiles are mandatory.
npm ci, notnpm install.poetry install --no-update, notpip install -r. Reproducible builds are a security control.
What I Don't Do (And Why)
Honest list — these are common recommendations I have considered and skipped at this scale, with the reasoning:
Runtime exploit prevention agents (Falco, Sysdig)
Excellent tools at scale; operational overhead and false-positive tuning are not justifiable for a sub-100-container fleet. The container hardening above closes most of the same surface.
Service mesh with mTLS between containers
Network-isolated apps with one front-door each don't benefit. mTLS is the right answer when you have lateral service-to-service traffic you can't physically isolate.
Confidential computing / TEEs
Useful when the threat model is "the hosting provider itself is adversarial". For most EU self-hosting on Hetzner / OVH, the legal framework does that work and TEE adds complexity without proportionate benefit.
The Build Order
If you're standing up a self-hosted AI stack and have to sequence the security work, this is the order I would do it in — each step is independently valuable and shouldn't block the next:
- Network isolation: per-app networks, ports bound to localhost, host firewall default deny.
- Container hardening defaults across every service.
- Non-root user and read-only filesystem on every app container.
- Least-privilege database users; row-level security where multi-tenant.
- Secrets out of images; required-variable syntax in compose.
- Pinned base images; lockfile-only installs; Trivy in CI.
- Correlation ids and bounded log volumes.
- Optional: vault, mesh, runtime EDR — only if your scale justifies it.
Closing
AI security is not a different discipline from infrastructure security. It's the same discipline applied to a stack with new components and a few new failure modes. If your container, network, and database hygiene is good, the AI-specific work — prompt-injection defenses, RAG access control, output filtering — sits on solid ground. If the foundation is weak, no amount of guardrail prompting compensates.
For related reading, see securing RAG pipelines against prompt injection, production Docker patterns for AI applications, and self-hosted AI vs cloud APIs.