Securing RAG Pipelines: Prompt Injection & Access Controls

RRogue AI·2026-04-23·12 min read

A prompt-injection instruction hidden in a retrieved document being blocked by a shield before reaching the LLM

“Sanitise your inputs” is where most RAG security advice stops. It is also where the real attack surface begins: in retrieval-augmented generation the documents themselves become an injection vector, and a poisoned file your model retrieves can do what a malicious prompt never could. This is the part appsec never had to think about.

If you are still building your first RAG pipeline, start with the production RAG architecture guide. This post assumes you already have a working pipeline and need to harden it for sensitive data, regulated industries, or multi-tenant deployments.

The RAG threat model

A RAG pipeline has five components that can be attacked independently: the ingestion layer, the vector store, the retriever, the LLM, and the response pipeline. Treating "the LLM" as the whole system is the most common mistake in RAG security reviews.

Ingestion. Attacker uploads or plants a document containing hidden instructions ("indirect prompt injection"). The document is chunked, embedded, stored, and eventually retrieved. The LLM follows the instructions.
Vector store. A shared store without row-level access controls returns chunks that belong to another tenant, another department, or another security clearance.
Retriever. Query patterns leak sensitive context to third-party embedding APIs. Query logs expose what users are searching for.
LLM. Generated output includes PII or credentials that existed in the retrieved context. Response is sent to an unauthorized user.
Response pipeline. No record of which source chunks produced which response. Audit logs cannot answer "what did this user actually see?"

Defense 1: Treat retrieved documents as untrusted input

The retrieved document is the biggest blind spot. Teams spend weeks hardening the user input field and ship a system that happily follows any instruction embedded in a retrieved PDF. In the real world, those PDFs can come from HR uploads, customer-submitted tickets, scraped web pages, and email attachments, every one of those is a vector for indirect prompt injection.

Three controls that work:

Strict delimiting. Wrap every retrieved chunk in clearly labeled, machine-readable delimiters (e.g. <source_doc_N>). Instruct the model explicitly that content inside delimiters is reference material, not instructions. This is not a guarantee, but it measurably reduces injection success rates.
Content classification before storage. Run a lightweight classifier at ingestion to flag documents that contain instruction-like patterns ("ignore previous instructions", "you are now", "system prompt"). Flagged documents go to a review queue, not the vector store.
Output action constraints. If the LLM can trigger tool calls, constrain those tools to a strict allowlist and require deterministic validation of arguments before execution. An LLM that cannot take destructive actions cannot be tricked into taking them.

Defense 2: Enforce access control at retrieval, not after

The naive pattern, retrieve top-k, filter results by ACL afterward, fails two ways. First, it leaks data through similarity patterns (an attacker can observe which queries return empty results and infer what exists). Second, it is brittle: any missed filter path leaks the full chunk to the LLM, which then surfaces it verbatim.

Enforce at retrieval. In pgvector, this means a WHERE clause on the ACL column as part of the nearest-neighbor query, not a post-filter. Row-level security policies in Postgres make this enforceable at the database layer, so an application bug cannot bypass it:

CREATE POLICY tenant_isolation ON document_chunks
 USING (tenant_id = current_setting('app.tenant_id')::uuid);

ALTER TABLE document_chunks ENABLE ROW LEVEL SECURITY;

-- Every query automatically scoped to the caller's tenant.
SELECT chunk_text FROM document_chunks
ORDER BY embedding <=> $1
LIMIT 10;

For per-document ACLs (not just tenant), add an allowed_principals array column and filter with @>. The cost is negligible at scale and the security model is enforceable and auditable.

Defense 3: Treat embeddings as data, not hashes

Embeddings are not hashes. Recent research has demonstrated practical inversion attacks against common embedding models, meaning a leaked embedding vector can reconstruct a plausible version of the source text. Treat embeddings with the same sensitivity as the underlying documents.

Encrypt embeddings at rest. In Postgres, use pgcrypto or application-layer encryption for vector columns containing sensitive corpora.
Restrict vector-store read access to the retrieval service. No direct user queries to the vector table.
Rotate embedding models for sensitive corpora, inversion attacks are model-specific.

Defense 4: Redact PII at ingestion, not at response

Post-hoc PII redaction on the generated response is a last-resort control. It fails under paraphrase, it fails under translation, and it fails when the user pastes the response into a system log. The defensible pattern is to redact at ingestion, before embedding.

A Microsoft Presidio or equivalent NER pipeline runs over every ingested chunk, replacing detected PII with tokens. The original PII and its token are stored in a separate, access-controlled table. When an authorized user queries, the retriever fetches the redacted chunks, the LLM generates a redacted response, and a final de-tokenization step replaces tokens with PII only for users whose ACL allows it.

This pattern has one major benefit beyond security: the redacted corpus can be used for evaluation, debugging, and ML training without exposing real data.

Defense 5: Log every retrieval with chunk-level attribution

The hardest compliance question for RAG systems is "what did this user actually see?" Without per-query chunk-level logging, the answer is "we do not know", and that answer fails GDPR, HIPAA, and SOC 2 audits equally.

The minimum audit record per query:

User / service principal
Query text (or hash, if the query itself is sensitive)
Retrieved chunk IDs with their source document IDs
LLM model + version + system prompt hash
Generated response ID (not the full response, store separately with retention policy)
Timestamp and request ID

Store the audit log in an append-only store (Postgres with REVOKE DELETE, or a dedicated append-only log service). The goal is that a forensic investigation six months later can reconstruct exactly what context the LLM saw when it produced a specific response.

What about prompt injection detection?

Commercial "prompt injection firewalls" exist. They help as defense in depth, but they are not primary controls. Every published detector has a documented bypass within weeks of release. Rely on the five controls above, treat injection detection as a layer on top, and never make a security decision based on "the detector did not flag this."

Mapping to compliance frameworks

Framework	Relevant control	RAG defense
EU AI Act	Art. 10 (data governance), Art. 12 (logging)	Defense 4, 5
GDPR	Art. 25 (privacy by design), Art. 32	Defense 2, 3, 4
ISO 27001:2022	A.8.24 (crypto), A.8.12 (data leakage)	Defense 3, 4
SOC 2	CC6.1 (logical access), CC7.2 (monitoring)	Defense 2, 5
HIPAA	§164.312(a)(1), §164.312(b)	Defense 2, 4, 5

The minimum bar for production RAG

A RAG pipeline is ready for sensitive production use when it can answer all five of these questions with evidence:

If an attacker plants a malicious document, what stops it?
If a user without access queries a topic, what prevents leakage?
If the embedding store is exfiltrated, what is the blast radius?
If a user receives PII they should not have, how is it detected?
For any past response, can you reconstruct what the LLM saw?

If any of those answers is “we’re trusting the LLM to do the right thing,” the pipeline is not production-ready, no matter how good the demo looked.

Need a RAG security review?

If you are running a RAG system with sensitive data, regulated workloads, or multi-tenant boundaries, a targeted security review can typically close the biggest gaps in a week. We do these engagements against the threat model above, produce a remediation plan with priorities, and hand back a rulebook your team can use on the next deployment. Book a discovery call or read the AI security consulting page for scope and pricing.