The honest answer first

If you’ve been told you need “an AI agent,” here’s the plain-English version I’d give you on a call.

An AI agent is a large language model (the thing behind ChatGPT or Claude) wired up so it can do things, not just talk. You give it a goal. It thinks, picks a tool — a web search, your database, an API, a code runner — uses it, looks at what came back, and decides what to do next. It keeps going in that loop until the job is done or it hits a stop condition. Anthropic, whose December 2024 “Building Effective Agents” guide is the reference most of us build against, defines agents as “systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.”

The cleanest one-liner I’ve found: chatbots handle conversations; agents handle work. A chatbot is read-only and reactive — it answers. An agent reads, writes, and acts.

I build these for a living in Python and Go through Nixbly, my company, and I’ll be straight with you about the part most vendors skip: most businesses asking for “an agent” actually need a tightly-scoped workflow with two or three tools and a clear metric — not an autonomous swarm. Knowing the difference is the whole game. Let me make it concrete.

The four-level ladder: LLM call vs chatbot vs workflow vs agent

Almost everything sold as “AI” sits on one of four rungs. The confusion in the market comes from calling all four “an agent.”

	What it is	Tools?	Memory?	A loop?	Example
1. Simple LLM call	One prompt in, one answer out	No	No	No	“Summarize this email”
2. Chatbot	Multi-turn conversation, reactive	Usually no	Conversation only	No	FAQ bot, support deflection
3. Workflow	LLM + tools on rails you hardcode	Yes, fixed	Often	No (fixed path)	“Read invoice → extract fields → write to DB”
4. Agent	LLM picks its own path in a loop	Yes, its choice	Yes	Yes	“Investigate why checkout is failing and fix it”

The dividing line between a workflow and an agent is who decides the steps. In a workflow, you wrote the steps in code — predictable, cheap, easy to debug. Anthropic calls these “systems where LLMs and tools are orchestrated through predefined code paths.” In an agent, the model decides the steps at runtime, because you couldn’t have predicted them.

Here’s the part the hype skips: most production “AI” today is workflows, not agents. Single-agent (or workflow) systems made up 59% of 2025 revenue — most real deployments are one tightly-scoped thing, not a swarm of autonomous bots. Workflows dominate production because they’re cheaper and you can actually debug them.

Is ChatGPT an AI agent? Plain ChatGPT answering a question is a chatbot / LLM call — rungs 1 and 2. The moment it browses the web, runs code, or calls a tool and acts on the result in a loop, that specific feature is agentic. Same model, different wiring.

From LLM call to agent: the four levels

LLM callOne prompt in, one answer out. No memory, no tools.
ChatbotA conversation — remembers the chat, still just talks.
WorkflowFixed, pre-defined steps the LLM fills in.
AgentDecides its own steps: plans, calls tools, checks results, loops.

Most 'AI agents' you'll be sold are really levels 1–3.

The moving parts (no CS degree required)

Every agent is built from four pieces. Strip away the jargon and it’s this:

1. The LLM (the brain). The model that reasons and decides. On its own it can only produce text. Anthropic calls the base building block the “augmented LLM” — a model “enhanced with augmentations such as retrieval, tools, and memory.” Everything below is an augmentation.

2. Tools / function calling (the hands). This is what turns a chatbot into an agent. You hand the model a list of tools — each with a name, a description, and a JSON schema of its inputs. The model reads that list, picks a tool, fills in the parameters, and issues the call. “Tool use” and “function calling” mean the same thing. A tool can be a Google search, a get_customer(id) lookup, a Stripe refund, or a code executor.

3. Memory (the notebook). Three tiers, in plain terms: working memory is the context window — what the model can “see” right now. Episodic memory is recent task history — what happened earlier in this job. Semantic memory is long-term facts it can look up — your docs, your product catalog, past tickets. An agent without memory forgets everything between steps; with it, it can carry a task forward.

4. The loop (the heartbeat). This is the defining feature. Anthropic’s description is exactly right: agents “are typically just LLMs using tools based on environmental feedback in a loop.” The model reasons about the goal, takes an action (a tool call), observes the result, then decides the next step — over and over until done. Each step gets “ground truth from the environment” (a real tool result, real code output), so the agent can self-correct instead of hallucinating its way forward.

The canonical version of this loop is called ReAct (Reasoning + Acting, from a 2022 Princeton/Google paper). It’s the right default for roughly 80% of production agents, mostly because every step is independently observable — when something goes wrong, you can see exactly which reason-act-observe cycle broke. For longer jobs there’s plan-and-execute: the model plans all the subtasks upfront, then runs them, re-planning only when needed — which saves expensive model calls versus invoking it on every single step.

One more term you’ll hear: MCP (Model Context Protocol) — an open standard Anthropic released in November 2024 and since adopted by OpenAI, Google, and major IDEs. It’s the USB-C of agent tooling: instead of writing custom glue for every data source, an MCP-compatible agent plugs into any MCP-compatible tool. It’s why connecting an agent to your systems got dramatically less painful over the last year.

The agent loop

GoalWhat you ask for

PlanLLM decides the next step

ActCalls a tool / API

ObserveReads the result

Repeat ↻Until done

An agent loops — plan, act, observe — until the goal is met.

Does your business actually need one? (The decision rule)

This is the question that saves you money, so here’s the rule I use:

Use a workflow whenever a task’s structure is stable enough to write down in code. Use a full agent only for open-ended problems where you genuinely can’t predict the number of steps ahead of time.

If you can draw the steps on a whiteboard — “always do A, then B, then C” — that’s a workflow. It’ll be cheaper, faster, and more reliable. Reach for an agent only when the path is genuinely unknowable: investigating a novel bug, researching across messy sources, handling support tickets where every one is different. Anthropic’s headline advice is the same one I give clients: start simple, and only add agentic complexity when simpler solutions fall short — because agents trade higher cost and latency for flexibility.

Define the metric before you write a line of agent code. Around 60% of DIY AI initiatives never scale past the pilot, and the usual reason is that no ROI or KPI was set first. “Resolve X% of tickets,” “cut invoice processing from 4 hours to 12 minutes” — pick the number, then build toward it.

Do you actually need an agent?

An agent fits when

The task has many steps and branches
It needs to call real tools/APIs
Inputs are messy and unpredictable
A human would 'figure it out' as they go

You don't need one when

A single prompt would do
The steps never change (use a workflow)
You can't tolerate any wrong action
Speed and cost matter more than autonomy

Powerful and overkill in equal measure — match the tool to the task.

What’s worth building — and what’s hype

The use cases with measured returns are real, and they cluster:

Customer support — the clearest winner. Klarna’s assistant handles 66% of inquiries, cut cost-per-transaction ~40%, and saved roughly $4M a year — work equivalent to 700 agents. 90% of CX leaders report positive ROI from AI support tools. Time-to-ROI here is often about two weeks.
Coding and dev tooling — Morgan Stanley’s DevGen.AI reviewed 9M+ lines of legacy code and saved an estimated 280,000 developer-hours. Maintainers report ~70% time savings on code review.
Ops / incident response — always-on agents that investigate incidents have reported MTTR (time-to-resolution) down ~65%.
Back-office automation — lead qualification 24/7 over WhatsApp, support auto-resolving ~68% of tickets, invoice agents that read a PDF/XML and cut a 50-invoice batch from 4 hours to ~12 minutes. These are the workhorses, and most are workflows, not autonomous agents.

Now the hype bucket, because demystifying means saying it out loud. Gartner predicts over 40% of agentic AI projects will be cancelled by the end of 2027 — escalating costs, unclear value, weak risk controls. Gartner also coined “agent washing”: vendors slapping “agent” on an old chatbot or RPA script. Their estimate is that only about 130 of the thousands of “agentic AI” vendors are genuinely agentic. Overrated for almost everyone right now: fully-autonomous “do-everything” general agents, and multi-agent swarms thrown at a task a single workflow handles fine.

If someone is selling you a “fully autonomous AI employee,” ask what tools it calls, what its loop looks like, and what happens when a tool fails. Real builders can answer in plain English. (For where these costs land, my breakdown of what a SaaS MVP actually costs uses the same scope-first logic — the cheapest feature is the one you agreed not to build.)

Why I build in Python and Go

A practical note, since people ask what these are written in. Python is the dominant agent ecosystem — the frameworks, the ReAct tooling, the MCP SDKs all live there first, so it’s where I prototype the loop and the tool definitions. Go is where I put the production glue: concurrent tool execution, low-latency services, and deploying the agent loop as a reliable backend that won’t fall over under real traffic. Prototype the reasoning in Python; ship the service in Go.

That split matters because the hard part of a production agent isn’t the demo — it’s the error handling, the retries, the observability, and keeping costs sane when it’s calling tools thousands of times a day. You can see the kind of AI and full-stack work I do for the full picture, or browse the blog for more.

The bottom line

An AI agent is an LLM that runs in a loop — reason, act, observe, repeat — using tools and memory to actually do work, not just talk. It’s the top rung of a four-level ladder, and most of the time the rung below it (a clean workflow) is what your business actually needs: cheaper, faster, debuggable. Start there, define your metric first, and only climb to a full agent when the problem is genuinely open-ended.

If you’re weighing whether your use case needs a real agent or a tightly-scoped workflow — and what it’d cost to build — tell me about it. I’ll give you a straight answer, founder to founder, no agent-washing.

Frequently asked questions

What is an AI agent in simple terms?

An AI agent is a large language model wired to take actions, not just answer. You give it a goal; it reasons, calls a tool (a search, an API, your database, a code runner), reads the result, and decides the next step — looping until the task is done. A chatbot only talks; an agent reads, writes, and acts.

What’s the difference between an AI agent and a chatbot?

A chatbot handles conversations — it’s reactive and read-only, answering questions over multiple turns. An agent handles work: it uses tools to read, write, and act on real systems, and it runs in a loop where it decides its own next steps. Same underlying model, very different wiring and capability.

Is ChatGPT an AI agent?

Plain ChatGPT answering a question is a chatbot or simple LLM call, not an agent. But when it browses the web, runs code, or calls a tool and acts on the result in a loop, that feature is agentic. The model is the same — what makes it an agent is the tools plus the loop around them.

When does a business actually need an AI agent instead of a workflow?

Use a workflow whenever a task’s steps are stable enough to write in code — it’s cheaper, faster, and easier to debug. Reach for a full agent only for open-ended problems where you genuinely can’t predict the number of steps, like investigating a novel issue or research across messy sources. Most production ‘AI’ is workflows, not agents.

Do AI agent projects fail, and why?

Often. Gartner predicts over 40% of agentic AI projects will be cancelled by end of 2027 due to cost, unclear value, and weak risk controls, and about 60% of DIY AI initiatives never scale past the pilot. The usual root cause is building before defining a clear ROI or KPI. Define the metric first, then build toward it.

What is ‘agent washing’?

Agent washing is vendors rebranding an existing chatbot, RPA script, or assistant as an ‘AI agent’ without any real agentic capability. Gartner estimates only about 130 of the thousands of ‘agentic AI’ vendors are genuinely agentic. If a vendor can’t explain the tools their agent calls and what its loop does when a tool fails, be skeptical.

What Is an AI Agent? A Builder's Plain-English Guide