Skip to main content
Home / Blog / Technical Guide
Technical Guide

Most Tasks Don't Need an AI Agent. Use a Pipeline.

RRogue AI··10 min read
A simple straight production line beside a tangled looping autonomous agent, contrasting a deterministic pipeline with a wandering AI agent

You do not need an AI agent for most tasks. If the work follows a knowable path, extract these fields, summarise this document, classify this ticket, draft this reply, a boring deterministic pipeline beats an autonomous agent on every axis that matters in production: reliability, cost, latency, debuggability, and attack surface. A pipeline is a plain script that calls a single language model once, validates the output against a schema, and drops the result on a queue. It does the same thing every time, fails in one obvious place, and costs one model call. An autonomous agent, by contrast, decides its own steps at runtime, loops through many model calls, and can wander off the path entirely. You reach for an agent only when the path genuinely cannot be known in advance: open-ended exploration, tool use across an unpredictable branch of decisions, recovery from states you could not enumerate. That is a real category, but it is the minority. Most of the work people are wiring agents into is not that. It is a fixed pipeline wearing an agent costume, and the costume adds latency, cost, and a dozen new failure modes for no reliability gain. Start with the pipeline. Add autonomy only when the pipeline provably cannot do the job.

This is not anti-AI, and it is not anti-agent. It is a plea to match the machine to the problem. The industry spent two years treating “agentic” as a synonym for “advanced,” and the result is a wave of systems that are harder to trust, harder to bill for, and harder to debug than the one-shot call they replaced. Below is the honest split: what a deterministic pipeline actually is, how the two compare across the metrics that decide production, the four questions that expose over-engineering, and the genuine cases where an agent earns its keep.

The default should be a pipeline, not an agent

The correct default for any new task is the simplest thing that could work: a script, one model call, a queue. Anthropic’s own engineering guidance on building effective agents makes the same point, advising teams to find the simplest solution possible and to add agentic complexity only when it demonstrably improves outcomes, because autonomous systems trade latency and cost for a flexibility you may not need. A fixed workflow is not a lesser answer. For a knowable task it is the correct one.

Anthropic draws a useful line between a workflow, where the steps are orchestrated through predefined code paths, and an agent, where the model directs its own process and chooses its own tools at runtime. Most business tasks are workflows. You know the inputs, you know the output shape, and you know the steps. When all three are knowable, encoding them in ordinary code and reserving the model for the one part that needs language understanding is not a compromise, it is the design. The agent framework you reached for is solving a problem of unknown control flow that your task does not have.

What a deterministic pipeline actually is

A deterministic pipeline is three boring parts: a script that owns the control flow, a single constrained model call that does the one thing only a model can do, and a queue that makes the whole thing durable and retryable. The script decides what happens and in what order. The model reads the invoice and returns structured fields, or reads the ticket and returns a category, and nothing more. The queue means a failure is a retry, not a lost job.

The power of this shape is that the intelligence is on tap, not in charge. The model is a pure function inside a flow you control: text in, structured data out, validated against a schema before anything downstream trusts it. If the output fails validation you retry or route to a human, in one obvious place. Compare that to an agent, where the model is the control flow: it chooses the next tool, interprets the result, decides whether it is done, and any of those hidden decisions can go wrong in a way you did not write and cannot easily see. The same discipline runs through building LLM features that survive production: integrate the model as a component first, and only escalate complexity when the evidence forces you to.

Pipeline versus agent, across the axes that decide production

On every dimension that determines whether a system survives contact with real traffic, the deterministic pipeline wins for knowable tasks. It is more reliable because it has one path, cheaper because it makes one model call, faster because it does not loop, easier to debug because every step is code you wrote, and safer because its blast radius is fixed. The agent trades all of that away for autonomy. Here is the comparison laid out plainly.

DimensionDeterministic pipelineAutonomous agent
ReliabilityOne fixed path, same result every run, testable end to endCompounding uncertainty over many steps, each hop can drift
CostOne model call per task, predictable per-unit priceMany calls per task, cost varies with how long it loops
LatencyOne round trip, sub-second to a few secondsSequential tool-and-model loops, often tens of seconds
DebuggabilityFailure lands in one line of code you wroteFailure hides in a reasoning trace you did not author
Attack surfaceModel output is data, validated before anything trusts itModel output is action, prompt injection becomes tool calls
When it fitsKnowable inputs, known output shape, known stepsUnknown control flow, open-ended exploration, real recovery

The security row deserves its own emphasis, because it is the one people skip. In a pipeline the model’s output is data you inspect. In an agent the model’s output is an action it takes: a shell command, an API call, a file write. That difference is the entire reason a hijacked agent is dangerous, and it is why prompt injection cannot be patched, only designed around. Every tool you hand an autonomous agent is a new way for a malicious instruction hidden in retrieved data to become a real action. A pipeline that treats model output as text to validate simply does not have that door.

Why the reliability math punishes long agent loops

Reliability compounds, and that is the quiet killer of multi-step agents. If each step in an autonomous loop is a strong 95% reliable, a five-step task lands around 77%, and a ten-step task around 60%. A pipeline with one model call and a validation gate keeps its reliability where the single call keeps it, because there is no chain of independent decisions to erode it. The agent does not fail because the model is weak, it fails because you asked one imperfect component to be right many times in a row with no checkpoint between.

This is exactly why so many impressive demos die on the way to production. The demo runs the happy path once. Production runs the unhappy path ten thousand times, and the agent’s failure modes are non-deterministic, so you cannot reproduce them, cannot write a regression test against them, and cannot promise a customer they are fixed. It is the same gap described in why 90% of AI projects fail before production: the model was never the bottleneck, the surrounding engineering was, and an agent multiplies the surrounding engineering you have to get right. If you cannot evaluate the system rigorously, adding autonomy just adds ways to fail that you cannot measure.

Four questions that expose over-engineering

Before you build an agent, answer four questions honestly. They take five minutes and they will send most tasks back to a pipeline. If you can name the steps, know the output shape, do not need mid-task recovery, and cannot afford the loop, you are looking at a workflow, and dressing it as an agent is decoration that costs you reliability.

  • Can I write down the steps in advance? If you can list the steps as a flowchart, encode that flowchart in code. Predefined control flow does not need a model to rediscover it on every run.
  • Is the output shape known? If the answer is always a fixed structure, a set of fields, a category, a yes or no with a reason, one constrained call returns it. Autonomy adds nothing to a known target.
  • Does the task truly need to react to what it discovers? Real agency means the next action genuinely depends on an unpredictable intermediate result. If every branch is one you could have written, it is a workflow with if-statements, not an agent.
  • Can I afford the latency and the token bill? An agent that loops for thirty seconds and burns twenty model calls per task is a real cost. If a single call would do, you are paying twenty times over for autonomy you did not use.

The most common failure is the resume-driven build: the agent exists because “multi-agent orchestration” sounds better in a demo and reads better on a CV than “a cron job that calls a model.” That instinct is expensive. Gartner expects over 40% of agentic AI projects to be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls, and it warns that much of the market is “agent washing,” existing products rebranded as agents, estimating only about 130 of thousands of self-described agentic vendors are the real thing. The over-engineering is not a fringe mistake. It is the base rate.

When an agent is genuinely the right tool

An agent earns its complexity when the control flow is genuinely unknown until runtime: when the task needs open-ended exploration, when the right next tool depends on what the last tool returned in ways you cannot enumerate, and when the value of getting there outweighs the loss of predictability. Those tasks exist, and for them an agent is not over-engineering, it is the only shape that fits. The skill is telling them apart from the pipeline in a costume.

Good agent-shaped work shares a signature: the number of steps is not knowable in advance, the path branches on discovered information, and a human doing the same job would also improvise. Open-ended research across sources you cannot list up front. An interactive coding assistant that explores a codebase and reacts to compiler errors. Triage that decides which of many tools to invoke based on an input you cannot pre-classify. When you build these, build them the hard way on purpose: scope every tool to least privilege, gate the irreversible actions, keep a human in the loop for the dangerous ones, and design the whole system to contain a hijack. That discipline is the subject of what breaks in agent orchestration in production and why a coding agent needs a leash. The point is not that agents are bad. It is that they are a specialised tool for unknown control flow, and you should pay their cost only when the task actually has it.

The pragmatic default: start boring, escalate on evidence

The workable rule is to start with the least autonomy that could solve the task and add autonomy only when the simpler shape provably fails. One prompt first. If one prompt is not enough, a fixed chain of prompts with validation between them. If a fixed chain cannot express the task because the path truly branches on discovered state, then, and only then, an agent, built with the guardrails above. Most tasks stop at step one or two. That is the finding, not a limitation.

This mirrors the honest sequencing that shows up whenever you compare AI shapes on their real trade-offs rather than their marketing, the same reasoning behind choosing self-hosted models versus cloud APIs or a retrieval pipeline over a fine-tune. Pick the simplest architecture that meets the requirement, measure it, and let evidence, not hype, decide when you climb the complexity ladder. The teams shipping reliable AI in production are, overwhelmingly, the ones running boring pipelines with a model in one well-defined slot.

The quick test for any agent you are about to build: could you draw its steps as a flowchart before it runs? If yes, it is a pipeline, and building it as an agent trades reliability, cost, and debuggability for autonomy the task never needed. Reserve the agent for the work whose flowchart can only be drawn afterwards.

Related reading

Quick Reference

Deterministic pipeline vs autonomous agent

DimensionDeterministic pipelineAutonomous agent
ReliabilityOne fixed path, same result every runCompounds over many steps, each hop can drift
CostOne model call per task, predictableMany calls per task, varies with loop length
LatencyOne round trip, secondsTool-and-model loops, often tens of seconds
DebuggabilityFailure lands in code you wroteFailure hides in a reasoning trace
Attack surfaceOutput is data, validated before trustedOutput is action, injection becomes tool calls
Best fitKnown inputs, known output, known stepsUnknown control flow, open-ended exploration

Frequently Asked Questions

When do I NOT need an AI agent?

You do not need an agent whenever the task follows a knowable path: extracting fields, summarising a document, classifying a ticket, drafting a reply. If you can write the steps down in advance, the output shape is fixed, and the task does not have to react to unpredictable intermediate results, a deterministic pipeline (a script, one constrained LLM call, a queue) is the correct design. It beats an autonomous agent on reliability, cost, latency, debuggability, and security. Reserve the agent for the minority of tasks whose control flow genuinely cannot be known until runtime.

What is a deterministic pipeline for AI tasks?

It is three boring parts: a script that owns the control flow, a single constrained model call that does the one thing only a model can do, and a queue that makes the work durable and retryable. The model is a pure function inside a flow you control, text in and structured data out, validated against a schema before anything downstream trusts it. The intelligence is on tap, not in charge, which is what makes the system reliable, cheap, fast, and easy to debug.

Why are autonomous agents less reliable than pipelines?

Because reliability compounds across steps. If each step in an autonomous loop is 95% reliable, a five-step task lands near 77% and a ten-step task near 60%. A pipeline with one model call and a validation gate keeps the single call's reliability because there is no chain of independent decisions to erode it. Agents also fail non-deterministically, so their production failures are hard to reproduce, hard to regression-test, and hard to promise a customer are fixed. Gartner expects over 40% of agentic AI projects to be canceled by the end of 2027.

When is an AI agent genuinely the right tool?

When the control flow is genuinely unknown until runtime: open-ended exploration, a next tool that depends on what the last tool returned in ways you cannot enumerate, and real recovery from states you could not list up front. Examples include open-ended research across sources you cannot name in advance, an interactive coding assistant reacting to compiler errors, and triage deciding which of many tools to invoke on an unclassifiable input. Build these with least-privilege tools, gated irreversible actions, and a human in the loop for the dangerous steps.

Related Articles

Technical Guide

Building LLM Features That Survive Production

11 min read

Technical Guide

AI Automation for Enterprise IT: Service Desks & Documents

11 min read

← All articles