Skip to main content
Home / Blog / Security Guide
Security Guide

The AI Skill Marketplace Is the New npm. It Got Poisoned.

RRogue AI··12 min read
A glowing AI-agent skill store shelf with one poisoned package cracked open, dark malware leaking from it into the registry

The AI-agent skill marketplace is the new npm registry, and it just got poisoned. ClawHavoc planted over a thousand malicious skills in OpenClaw’s ClawHub before anyone noticed, and the only thing it needed to do that was a GitHub account one week old. We are repeating, one layer up, every supply-chain mistake we already made.

I have watched this exact film before. npm taught us that anyone can publish, that names are squatted, that one popular package quietly turns malicious overnight. PyPI taught us the same with typosquats and post-install scripts. Browser extension stores taught us that a five-star tool with a glowing description can sell out its user base the day it gets acquired. Now we are handing autonomous agents a marketplace of community “skills” with read access to files, network calls, and shell, and we are surprised when it bleeds.

What Was the ClawHavoc Attack?

ClawHavoc was a coordinated supply-chain campaign that planted 1,184 malicious skills in ClawHub, the open skill marketplace for the viral agent platform OpenClaw. The first malicious skill landed on 27 January 2026, the campaign surged on 31 January, and Koi Security named it on 1 February. Repello AI estimates the campaign reached around 300,000 agent users.

The skills were not crude. They were disguised as popular, high-demand tools and embedded their payloads three ways: staged downloads that pulled extra malware after install, reverse shells opened through Python system calls, and direct data-grab payloads that ran on first use. A 16 February follow-up scan reported by eSecurity Planet still found 824 confirmed-malicious skills sitting inside a registry that had grown past 10,700 entries. On macOS the payload was a variant of Atomic macOS Stealer, which lifted browser credentials, keychains, Telegram data, SSH keys, and crypto wallets, then compressed and exfiltrated them.

Why Is the Marketplace Trust Model Broken?

The trust model is broken because it asks the user to judge code by its description. To publish on ClawHub at the time of ClawHavoc you needed only a GitHub account at least a week old. No automated static analysis, no code review, no signing requirement. Nothing connected the friendly skill description to what the code actually did once an agent ran it.

That gap is the whole attack. A human installing an npm package can at least read the source and the install scripts. An autonomous agent picking a skill from a marketplace reads the natural-language description, decides it matches the task, and runs it. The description is attacker-controlled marketing copy. The behaviour is whatever the author wrote. When those two diverge, the agent has no way to tell, and neither does the user who delegated the task to it.

Identity is free

A week-old GitHub account is not an identity. It is a throwaway. Reputation that costs nothing to create is reputation an attacker can mint by the hundred, which is exactly what ClawHavoc did.

Descriptions are trusted, code is not inspected

The selection signal is the skill’s self-description and its star count. Both are forgeable. No layer in the chain verifies that the code matches the claim.

Permissions are ambient

A skill inherits whatever the agent can do: read the home directory, make outbound calls, run shell. There is no per-skill permission boundary, so a note-taking skill can reach your SSH keys.

No provenance

Nothing proves the published artefact was built from the source you can see, by the author named, without tampering in between. The registry serves bytes, not a verifiable chain of custody.

We Already Solved This Once. Here Is the Map.

Every defence the package-registry world bolted on after its own breaches maps directly onto the skill marketplace. The agent layer is not facing novel attacks. It is facing 2018-era npm attacks against software that has not yet grown the immune system npm spent years building. This is the comparison that matters most.

Failure modenpm / PyPI / extensionsAI skill marketplace (today)The defence that worked
Anyone can publishEmail-verified accountWeek-old GitHub accountVerified publishers, namespace ownership, 2FA on publish
Install-time code executionpostinstall scripts, setup.pyStaged downloads, reverse shells on first runSandboxed install, no implicit script execution
Trust by popularityDownload counts, starsSkill description, star countProvenance attestation tied to the source build
Tampered artefactHijacked maintainer pushes a bad versionNo link between source and served bytesSigstore signing, SLSA build provenance
Excess capabilityA package can read your whole environmentA skill inherits all agent permissionsLeast-privilege scopes, capability manifests, egress allow-lists

What Actually Defends Against This?

Four controls actually defend an agent against a poisoned marketplace: cryptographic provenance so you trust builds not descriptions, least-privilege permissions so a skill cannot reach what it has no business touching, runtime egress control so stolen data has nowhere to go, and behavioural review instead of reading the marketing copy. The order matters because the last two work even when the first two fail.

Provenance and signing, not descriptions

Trust the build, not the blurb. A skill should ship with a signature you can verify and a provenance attestation that links the served artefact back to a specific source commit and build. Sigstore keyless signing and SLSA provenance already do this for container images and packages. Until a marketplace requires them, pin skills to a reviewed commit hash and treat an unsigned skill the way you would treat an unsigned binary from a stranger.

Least-privilege agent permissions

A skill that summarises documents has no reason to read your SSH keys or open an outbound socket to an unknown host. Give each skill the narrowest capability set the task needs, declared up front, enforced by the runtime. This is the same lesson container hardening teaches. I wrote about the runtime version of it in securing self-hosted AI infrastructure, and the principle is identical: drop everything by default, add back only what is justified.

Runtime egress control

The AMOS payload had to phone home to be worth anything. Stolen keychains and wallets are useless until they leave the machine. A default-deny egress policy at the network layer means a compromised skill can run and still exfiltrate nothing, because the connection to the attacker’s server is refused. This is the control that saves you when provenance and permissions both fail, and it is cheap to put in place.

Do not trust the skill description

The description is attacker-controlled. Treat it as a marketing claim, not a specification. Pin to specific reviewed versions, read the actual code or a trusted review of it, prefer skills with verifiable provenance, and never let an agent auto-install a skill it discovered mid-task. The convenience of “the agent found a tool and used it” is exactly the convenience ClawHavoc weaponised.

This Is Systemic, Not a One-Off

ClawHavoc is the loudest case, not the only one. A cluster of related attacks in the same window shows the agent ecosystem’s soft underbelly is structural. MemoryTrap poisoned a coding agent’s persistent memory so the compromise spread across sessions. SymJack used symlink hijacking to land remote code execution and broke six AI coding agents at once. TrustFall reached Claude Code, Cursor, Gemini CLI, and GitHub Copilot through a regressed trust dialog with a single click.

Underneath all of it sits prompt injection, which OWASP ranks as LLM01, the number-one risk for LLM applications in 2025 and 2026. A January 2026 review of 78 studies found every major coding agent fell to prompt injection, with adaptive attacks landing more than 85% of the time. A marketplace is just a delivery mechanism for that underlying weakness. If you are designing an agent that selects its own tools, read how this connects to what breaks in agent orchestration at production scale.

The Trade-Offs Nobody Likes to State

Every defence here costs the thing that made these marketplaces viral: frictionless self-service. There is a real tension, and pretending otherwise is how you end up with security theatre instead of security.

Signing requirements slow publishing

Mandatory provenance raises the bar for honest contributors too. The answer is not to skip it but to make the tooling invisible, the way Sigstore made keyless signing close to free.

Least privilege breaks lazy skills

Some skills genuinely need broad access. A capability manifest forces authors to declare and justify it, which is the point. The friction is the security.

Egress control needs network plumbing

Default-deny egress is trivial in a self-hosted Docker fleet and awkward on a developer laptop. That asymmetry is itself an argument for running agents in controlled environments rather than directly on endpoints.

What I Would Do This Week

If you run agents that pull skills from a marketplace, here is the sequence I would work through, each step independently useful:

  1. Inventory every skill your agents can install, and pin each one to a reviewed commit, not a floating “latest” tag.
  2. Turn off auto-discovery and auto-install. An agent should never run a skill a human has not approved.
  3. Put a default-deny egress policy in front of the agent runtime, with an explicit allow-list of endpoints it actually needs.
  4. Scope agent capabilities to the task. A document agent does not need shell or SSH-key access.
  5. Prefer skills with verifiable provenance, and treat unsigned skills as untrusted by default.
  6. Log what skills run and what they reach, so the next ClawHavoc shows up in your telemetry instead of a breach report.

Closing

The painful part is that none of this is new. We learned it the hard way with npm and PyPI, wrote the playbook, and built the tooling. The agent ecosystem skipped straight to a marketplace without the immune system, because shipping fast beat shipping safe. ClawHavoc is the bill for that choice. The fix is not a smarter model. It is provenance, least privilege, and egress control, applied with the same seriousness we eventually applied to the package registries we already poisoned once.

Related reading: see securing RAG pipelines against prompt injection, why AI projects fail before production, EU AI Act compliance for builders, and the private-AI approach behind Vaultic. If you are wiring an LLM into real systems, start with why the model is rarely the problem.

Frequently Asked Questions

What was the ClawHavoc attack?

ClawHavoc was a coordinated supply-chain campaign that planted 1,184 malicious skills in ClawHub, the open skill marketplace for the OpenClaw agent platform. The first malicious skill appeared on 27 January 2026, the campaign surged on 31 January, and Koi Security named it on 1 February. It reached roughly 300,000 agent users. Skills disguised as popular tools delivered staged-download malware, reverse shells, and direct data-grab payloads, including an Atomic macOS Stealer variant that lifted credentials, keychains, SSH keys, and crypto wallets.

Why is the AI skill marketplace trust model broken?

Because it asks users and agents to judge code by its description. Publishing on ClawHub required only a GitHub account at least a week old, with no static analysis, code review, or signing. An autonomous agent selects a skill from its natural-language description and runs it, but the description is attacker-controlled marketing while the behaviour is whatever the author wrote. When those diverge, neither the agent nor the user can tell.

How is this like the npm and PyPI supply-chain attacks?

It is the same class of attack one layer up. npm and PyPI suffered from anyone being able to publish, install-time code execution, trust by popularity, hijacked maintainers, and packages with excess capability. The skill marketplace repeats every one of these. The package world solved them with verified publishers, sandboxed installs, provenance attestation, Sigstore signing, and least-privilege scopes. The agent ecosystem has not adopted that immune system yet.

What actually defends an AI agent against a poisoned marketplace?

Four controls: cryptographic provenance and signing so you trust the build rather than the description, least-privilege permissions so a skill cannot reach files or hosts it has no business touching, runtime default-deny egress control so stolen data has nowhere to exfiltrate, and behavioural review instead of trusting the skill description. The last two work even when provenance and permissions fail, which is why egress control is the control that saves you in a breach.

Should an AI agent auto-install skills it finds during a task?

No. Auto-discovery and auto-install are exactly the convenience ClawHavoc weaponised. Pin every skill to a reviewed commit hash, disable auto-install, and require a human to approve any new skill before an agent runs it. Treat unsigned skills as untrusted by default, the same way you would treat an unsigned binary from a stranger.

Related Articles

Security Guide

Securing RAG Pipelines: Prompt Injection & Access Controls

12 min read

Security Guide

Securing Self-Hosted AI: Infrastructure Hardening

13 min read

← All articles