Skip to main content
Home / Blog / Security Guide
Security Guide

Your AI Coding Agent Will rm -rf Your Life. Leash It.

RRogue AI··10 min read
An AI coding agent confidently running a recursive delete across a developer's home directory while a folder of family photos vanishes

Your AI coding agent will eventually do something catastrophic, and it will do it with total confidence. In the last few months it asked to “organise” a desktop and deleted fifteen years of a family’s photos. Told to clean up a repository, it ran a recursive delete with a stray home-directory path and wiped a developer’s entire Mac. On another machine it walked the same command up from the filesystem root. None of these agents was malicious. Every one of them was confident, and nothing stood between its decision and your shell.

That gap is the whole story, and the good news is that it is an engineering problem with an engineering answer. This is what is actually going wrong, why a better model will not fix it, and how to put the agent on a leash short enough that its worst day costs you a few minutes instead of fifteen years of memories.

Two ways the agent hurts you

The failures split cleanly into the loud kind and the quiet kind, and the quiet kind is worse.

The loud kind is the destructive command. An agent decides the fastest way to satisfy your request is to delete something, and it is right that deleting is fast and wrong about what to delete. These make the headlines because the damage is total and instant: a wiped home directory, a dropped database, a force-pushed branch. They are terrifying and they are also, in a sense, the easy case, because you find out immediately.

The quiet kind is code that is subtly worse than a human would write, merged because it looked fine. A December 2025 study by CodeRabbit compared hundreds of AI-co-authored pull requests against human-only ones and found AI code shipped 1.7 times more issues overall and 2.74 times more security issues specifically. AI-authored changes were roughly 1.9 times more likely to mishandle passwords, introduce an insecure direct object reference, or add unsafe deserialisation, with logic and correctness errors up around 75 percent. The destructive command costs you an afternoon. The quiet defect costs you a breach six months later.

Why this keeps happening, and why a better model will not fix it

The root cause is not that the model is dumb. It is architectural. As the Docker team put it after cataloguing these incidents, the agent runs as you, on your filesystem, with your credentials, and nothing sits between the model’s decision and the shell’s execution. It is a confident junior engineer who has been handed root on your laptop, your production credentials, and no code review, and then told to move fast.

That is why waiting for a smarter model misses the point. The CodeRabbit numbers are across current frontier models, not some weak outlier, and the destructive-command incidents involve the best agents available. The model will occasionally be wrong no matter how good it gets, the same way a brilliant junior occasionally runs the wrong command. The fix is not a better junior. It is the boundary you put around any junior with that much power. We made the same argument about agents in general in why the prompt-injection number never reaches zero: design for the mistake, do not pray it away.

The leash: containment for coding agents

Everything that actually works is about shrinking what the agent can reach and reverse, not about trusting it more.

  • Run it in a sandbox, not on your host. Give the agent a container or VM whose filesystem is the project and nothing else. Your home directory, your SSH keys, your real cloud credentials, and your photos are simply not mounted, so they cannot be deleted or leaked. This is the same hardening discipline in securing self-hosted AI infrastructure.
  • Never use the “skip all permissions” flag on your real machine. The auto-approve mode that makes demos smooth is exactly what removes the last gate before a destructive command. If you must run it, run it inside the sandbox.
  • Gate the irreversible. Require explicit confirmation for deletes, force-pushes, and database migrations. A dry-run-by-default policy turns “it already happened” into “it asked first.”
  • Treat AI-written code as untrusted input. The 2.74x figure means AI code needs more review, not less. Run it through the same linters, SAST, tests, and evaluation gates you would demand of any contributor, the way we lay out in testing AI systems before production.
  • Keep version control and backups honest. Committed work and a recent backup are what turn a catastrophic delete into an annoyance. The fifteen-years-of-photos case hurt because there was no copy.

Loose agent versus leashed agent

FailureLoose agent (on your host)Leashed agent (sandboxed)
Recursive deleteWipes your home directory, photos, keysDeletes inside the throwaway container only
Credential reachReads your real SSH keys and cloud tokensSees only scoped, disposable test credentials
Subtly insecure codeMerged because it looked fineCaught by the same SAST and eval gate as any PR
A bad commandIrreversible, no copyReversible from git and a recent backup

Do not ban it, contain it

The productivity is real, and abstinence is not the answer any more than banning power tools is. The answer is the leash. An AI coding agent is the most capable junior you have ever hired and the least supervised, and the trust model that ships safely treats it exactly that way: powerful, useful, and never given unsupervised root. We make the same case for putting agents into production deliberately rather than hopefully in what breaks in agent orchestration at production scale and building LLM features that survive production. The same supply-chain caution applies to the tools the agent itself pulls in, which we covered in how the agent skill marketplace got poisoned.

You would not give a new hire unsupervised root on your laptop on day one. Do not give it to the agent either. Sandbox it, scope it, review its output, keep backups, and then let it run as fast as it likes, because the blast radius is now a container instead of your life’s work.

Related reading: see securing self-hosted AI infrastructure, how to test AI systems before production, and what breaks in agent orchestration at production scale.

Quick Reference

Coding-agent failure modes and the leash for each

Failure modeWhat it costsThe leash
Recursive delete on your hostHome directory, keys, photos, all goneRun the agent in a container that only mounts the project
Reaches real credentialsSSH keys and cloud tokens exfiltrated or usedScoped, disposable test credentials only
Subtly insecure code mergedA breach months later, 2.74x more likelySame SAST, tests, and eval gate as any pull request
Irreversible commandNo copy to restore fromCommit often, keep a recent backup, dry-run deletes

Frequently Asked Questions

Why do AI coding agents delete files they should not?

Because the agent runs as you, on your filesystem, with your credentials, and nothing sits between the model's decision and the shell. When it concludes that deleting something is the fastest way to finish a task, it executes the command directly. Documented cases include an agent that erased about fifteen years of a family's photos while 'organising' a desktop and others that ran a recursive delete from the home directory or filesystem root. The agent is not malicious, it is confident and unsupervised.

Is AI-generated code less secure than human code?

On the evidence, yes, measurably. A December 2025 CodeRabbit study of hundreds of pull requests found AI-co-authored code shipped 1.7 times more issues overall and 2.74 times more security issues specifically, including markedly higher rates of improper password handling, insecure direct object references, and unsafe deserialisation. The practical takeaway is not to ban AI code but to review it more, treating it as untrusted input that must pass the same SAST, tests, and evaluation gates as any contribution.

How do I safely run an AI coding agent?

Put it on a leash that shrinks its blast radius. Run it in a sandboxed container or VM whose filesystem is the project and nothing else, so your home directory, SSH keys, and real credentials are not even reachable. Never enable the skip-all-permissions mode on your real machine, require explicit confirmation for destructive or irreversible commands, keep work in version control with recent backups, and run everything it writes through your normal review and CI gates.

Should I stop using AI coding agents because of these risks?

No. The productivity gain is real, and abstinence is no more the answer than banning power tools. Treat the agent as the most capable and least supervised junior you have ever hired: powerful and useful, but never given unsupervised root. Sandbox it, scope its permissions, gate the irreversible, and review its output, and you keep the speed while making its worst day a container problem instead of a catastrophe.

Related Articles

Security Guide

Prompt Injection Cannot Be Patched. Design Around It.

10 min read

Security Guide

Prompt Injection Now Has a Number: 31.5% Agent Hijack

10 min read

← All articles