Your AI Coding Agent Will rm -rf Your Life. Leash It.

Your AI coding agent will eventually do something catastrophic, and it will do it with total confidence. In the last few months it asked to “organise” a desktop and deleted fifteen years of a family’s photos. Told to clean up a repository, it ran a recursive delete with a stray home-directory path and wiped a developer’s entire Mac. On another machine it walked the same command up from the filesystem root. None of these agents was malicious. Every one of them was confident, and nothing stood between its decision and your shell.
That gap is the whole story, and the good news is that it is an engineering problem with an engineering answer. This is what is actually going wrong, why a better model will not fix it, and how to put the agent on a leash short enough that its worst day costs you a few minutes instead of fifteen years of memories.
Two ways the agent hurts you
The failures split cleanly into the loud kind and the quiet kind, and the quiet kind is worse.
The loud kind is the destructive command. An agent decides the fastest way to satisfy your request is to delete something, and it is right that deleting is fast and wrong about what to delete. These make the headlines because the damage is total and instant: a wiped home directory, a dropped database, a force-pushed branch. They are terrifying and they are also, in a sense, the easy case, because you find out immediately.
The quiet kind is code that is subtly worse than a human would write, merged because it looked fine. A December 2025 study by CodeRabbit compared hundreds of AI-co-authored pull requests against human-only ones and found AI code shipped 1.7 times more issues overall and 2.74 times more security issues specifically. AI-authored changes were roughly 1.9 times more likely to mishandle passwords, introduce an insecure direct object reference, or add unsafe deserialisation, with logic and correctness errors up around 75 percent. The destructive command costs you an afternoon. The quiet defect costs you a breach six months later.
Why this keeps happening, and why a better model will not fix it
The root cause is not that the model is dumb. It is architectural. As the Docker team put it after cataloguing these incidents, the agent runs as you, on your filesystem, with your credentials, and nothing sits between the model’s decision and the shell’s execution. It is a confident junior engineer who has been handed root on your laptop, your production credentials, and no code review, and then told to move fast.
That is why waiting for a smarter model misses the point. The CodeRabbit numbers are across current frontier models, not some weak outlier, and the destructive-command incidents involve the best agents available. The model will occasionally be wrong no matter how good it gets, the same way a brilliant junior occasionally runs the wrong command. The fix is not a better junior. It is the boundary you put around any junior with that much power. We made the same argument about agents in general in why the prompt-injection number never reaches zero: design for the mistake, do not pray it away.
The leash: containment for coding agents
Everything that actually works is about shrinking what the agent can reach and reverse, not about trusting it more.
- Run it in a sandbox, not on your host. Give the agent a container or VM whose filesystem is the project and nothing else. Your home directory, your SSH keys, your real cloud credentials, and your photos are simply not mounted, so they cannot be deleted or leaked. This is the same hardening discipline in securing self-hosted AI infrastructure.
- Never use the “skip all permissions” flag on your real machine. The auto-approve mode that makes demos smooth is exactly what removes the last gate before a destructive command. If you must run it, run it inside the sandbox.
- Gate the irreversible. Require explicit confirmation for deletes, force-pushes, and database migrations. A dry-run-by-default policy turns “it already happened” into “it asked first.”
- Treat AI-written code as untrusted input. The 2.74x figure means AI code needs more review, not less. Run it through the same linters, SAST, tests, and evaluation gates you would demand of any contributor, the way we lay out in testing AI systems before production.
- Keep version control and backups honest. Committed work and a recent backup are what turn a catastrophic delete into an annoyance. The fifteen-years-of-photos case hurt because there was no copy.
Loose agent versus leashed agent
| Failure | Loose agent (on your host) | Leashed agent (sandboxed) |
|---|---|---|
| Recursive delete | Wipes your home directory, photos, keys | Deletes inside the throwaway container only |
| Credential reach | Reads your real SSH keys and cloud tokens | Sees only scoped, disposable test credentials |
| Subtly insecure code | Merged because it looked fine | Caught by the same SAST and eval gate as any PR |
| A bad command | Irreversible, no copy | Reversible from git and a recent backup |
Do not ban it, contain it
The productivity is real, and abstinence is not the answer any more than banning power tools is. The answer is the leash. An AI coding agent is the most capable junior you have ever hired and the least supervised, and the trust model that ships safely treats it exactly that way: powerful, useful, and never given unsupervised root. We make the same case for putting agents into production deliberately rather than hopefully in what breaks in agent orchestration at production scale and building LLM features that survive production. The same supply-chain caution applies to the tools the agent itself pulls in, which we covered in how the agent skill marketplace got poisoned.
You would not give a new hire unsupervised root on your laptop on day one. Do not give it to the agent either. Sandbox it, scope it, review its output, keep backups, and then let it run as fast as it likes, because the blast radius is now a container instead of your life’s work.
Related reading: see securing self-hosted AI infrastructure, how to test AI systems before production, and what breaks in agent orchestration at production scale.
Quick Reference
Coding-agent failure modes and the leash for each
| Failure mode | What it costs | The leash |
|---|---|---|
| Recursive delete on your host | Home directory, keys, photos, all gone | Run the agent in a container that only mounts the project |
| Reaches real credentials | SSH keys and cloud tokens exfiltrated or used | Scoped, disposable test credentials only |
| Subtly insecure code merged | A breach months later, 2.74x more likely | Same SAST, tests, and eval gate as any pull request |
| Irreversible command | No copy to restore from | Commit often, keep a recent backup, dry-run deletes |
Frequently Asked Questions
Why do AI coding agents delete files they should not?
Because the agent runs as you, on your filesystem, with your credentials, and nothing sits between the model's decision and the shell. When it concludes that deleting something is the fastest way to finish a task, it executes the command directly. Documented cases include an agent that erased about fifteen years of a family's photos while 'organising' a desktop and others that ran a recursive delete from the home directory or filesystem root. The agent is not malicious, it is confident and unsupervised.
Is AI-generated code less secure than human code?
On the evidence, yes, measurably. A December 2025 CodeRabbit study of hundreds of pull requests found AI-co-authored code shipped 1.7 times more issues overall and 2.74 times more security issues specifically, including markedly higher rates of improper password handling, insecure direct object references, and unsafe deserialisation. The practical takeaway is not to ban AI code but to review it more, treating it as untrusted input that must pass the same SAST, tests, and evaluation gates as any contribution.
How do I safely run an AI coding agent?
Put it on a leash that shrinks its blast radius. Run it in a sandboxed container or VM whose filesystem is the project and nothing else, so your home directory, SSH keys, and real credentials are not even reachable. Never enable the skip-all-permissions mode on your real machine, require explicit confirmation for destructive or irreversible commands, keep work in version control with recent backups, and run everything it writes through your normal review and CI gates.
Should I stop using AI coding agents because of these risks?
No. The productivity gain is real, and abstinence is no more the answer than banning power tools. Treat the agent as the most capable and least supervised junior you have ever hired: powerful and useful, but never given unsupervised root. Sandbox it, scope its permissions, gate the irreversible, and review its output, and you keep the speed while making its worst day a container problem instead of a catastrophe.