What Are AI Agents and How Do They Work?

By Andy · March 14, 2026 · 11 min read

You type a question into ChatGPT. It responds. You ask a follow-up. It responds again. This back-and-forth is useful, but it has a ceiling. The chatbot can only talk — it can't do anything. It can't open a file, check a database, send an email, or run code on your behalf. The moment you need action, not just words, the chatbot hits a wall.

AI agents break through that wall. An agent is a system built around a language model that can take actions, make decisions, and work toward goals with minimal hand-holding. Instead of waiting for your next message, an agent can plan a sequence of steps, use tools to carry them out, remember what it learned along the way, and adjust its approach when something doesn't work.

I build agents as part of my daily work — including the chatbot on this site, which uses retrieval and tool calls to answer questions about my projects. This article covers what agents are, how they actually work under the hood, and where they're headed.

1. Agents vs. Chatbots: The Real Difference

A chatbot generates text. An agent generates text and then acts on it. That single distinction changes everything about what's possible.

Consider a concrete example. You tell a chatbot: "Find the cheapest flight from New York to London next Friday and book it." The chatbot will give you advice about where to search, maybe suggest some airlines. Helpful, but you still have to go do it yourself.

An agent given the same request would open a flight search tool, query for available flights, compare prices, select the cheapest option, fill in your details, and complete the booking. It does the work, not just the thinking.

The differences boil down to three things:

Autonomy. Chatbots are reactive — they respond to each message independently. Agents are proactive — they can take multiple steps toward a goal without needing a prompt at every stage.
Tool use. Chatbots produce text. Agents can call APIs, execute code, read files, search the web, and interact with external systems.
Persistence. Chatbots have short-term context within a conversation. Agents can maintain long-term memory across sessions, learning from previous interactions and building up knowledge over time.

That said, the line between "chatbot" and "agent" isn't always sharp. Modern chatbots increasingly have tool-calling capabilities. The distinction is more of a spectrum than a binary. But the mental model is useful: if it only talks, it's a chatbot. If it acts, it's an agent.

2. The Three Core Components of Every Agent

Strip away the hype and every AI agent has the same three building blocks: a planning system, tools, and memory. The language model sits at the center, coordinating all three.

Planning

Planning is how the agent breaks a goal into steps. When you say "research competitor pricing and put together a comparison spreadsheet," the agent needs to figure out what to do first, second, third — and how each step feeds into the next.

Most agents use the language model itself for planning. The model receives the goal, the available tools, and any relevant context, then outputs a plan — either all at once or one step at a time. The one-step-at-a-time approach (called ReAct, for Reason + Act) is more common because it lets the agent adjust its plan based on what it discovers along the way.

ReAct pattern example

Goal: Find the current weather in Tokyo and convert to Fahrenheit

Thought: I need to check the weather in Tokyo. I'll use the weather API.
Action: weather_api(location="Tokyo")
Observation: 18°C, partly cloudy

Thought: The temperature is in Celsius. I need to convert 18°C to Fahrenheit.
Action: calculate(expression="18 * 9/5 + 32")
Observation: 64.4

Thought: I have all the information needed.
Response: It's currently 64.4°F (18°C) in Tokyo, partly cloudy.

Each cycle of Thought → Action → Observation lets the agent reason about what to do, do it, see the result, and decide what comes next. This loop continues until the agent decides the goal has been met.

Tool use

Tools are what give agents their power. A tool is any function the agent can call — a web search, a database query, a file reader, a code executor, an API endpoint. Without tools, an agent is just a chatbot with extra steps.

The way tools work in practice: the agent receives a list of available tools with descriptions of what each one does and what parameters it accepts. When the agent decides to use a tool, it outputs a structured call (usually JSON) specifying which tool and what arguments. The system executes that call and returns the result to the agent.

Tool design matters enormously. A well-designed tool with a clear name and description gets used correctly. A poorly described tool causes the agent to misuse it or ignore it entirely. I wrote a full guide on this topic — how to design agent skills — because getting tool definitions right is half the battle of building a good agent.

Memory

Memory is what lets an agent learn and maintain state. There are two kinds:

Short-term memory is the conversation context — everything the agent has seen and done in the current session. This lives in the language model's context window. It's limited by the model's maximum token count, which means agents working on long tasks need strategies for managing what stays in context and what gets summarized or dropped.

Long-term memory persists across sessions. This could be a file on disk, a database, a vector store, or a structured knowledge graph. Long-term memory lets an agent remember your preferences, past decisions, and accumulated knowledge without re-learning everything each time it starts up.

For example, you can configure agent memory and behavior using a CLAUDE.md file — a configuration file that tells the agent who it is, what it knows, and how it should behave. This is a form of long-term memory baked into the agent's setup.

For knowledge-heavy applications, agents use Retrieval Augmented Generation (RAG) — pulling relevant documents from a knowledge base at query time so the agent can answer questions grounded in actual source material rather than relying on what the model memorized during training.

3. The Agent Loop: How Agents Actually Run

Understanding the agent loop is the key to understanding how agents work. Here's the full cycle:

Receive input. A goal, question, or trigger event arrives.
Retrieve context. The agent pulls relevant information from memory — past interactions, stored knowledge, configuration files.
Reason. The language model analyzes the input plus context and decides what to do next.
Act. The agent calls a tool, generates a response, or takes some other action.
Observe. The result of the action comes back — tool output, API response, error message.
Update memory. The agent stores anything worth remembering for later.
Loop or stop. If the goal isn't met, go back to step 3. If it is, deliver the final result.

The number of loops varies wildly. A simple question might take one cycle. A research task might take twenty. A coding task with debugging could take fifty or more.

Key insight: The quality of an agent isn't determined by how smart its language model is. It's determined by how well the loop is designed — how good the tools are, how relevant the retrieved context is, and how gracefully the agent handles errors and dead ends. I've seen agents built on smaller models outperform agents on frontier models because the rest of the system was better engineered.

4. Types of AI Agents

Not all agents work the same way. Here are the main architectures you'll encounter:

Single-agent systems

One language model, one set of tools, one loop. This is the simplest architecture and works well for focused tasks — answering questions from a knowledge base, automating a specific workflow, acting as a personal assistant. Most agents in production today are single-agent systems.

Multi-agent systems

Multiple specialized agents working together. One agent might handle research, another handles writing, a third handles code review. They pass information between each other, each contributing their specialty. This works well for complex tasks that benefit from division of labor — but adds significant coordination overhead.

Autonomous agents

Agents that run continuously without human intervention. They monitor for events, respond to triggers, and execute tasks on schedules. Think of a security agent that monitors logs and responds to anomalies, or a content agent that researches and publishes articles on a recurring schedule.

Human-in-the-loop agents

Agents that handle most of the work but pause for human approval at critical decision points. This is common in high-stakes applications — an agent might draft a customer response but wait for a human to approve it before sending, or prepare a code change but require review before merging.

5. Real-World Examples

Agents have moved well past the proof-of-concept stage. Here are categories where they're doing real work:

Developer tools. Coding agents like Claude Code, Cursor, and GitHub Copilot Workspace can read codebases, plan changes, write code, run tests, and fix bugs. They operate in the same loop described above — reason about the task, make a change, observe the test results, iterate. These are among the most mature agent applications.

Customer support. Agents that handle incoming tickets by searching knowledge bases, pulling up customer history, and either resolving issues directly or routing to the right human. The best ones handle 40-60% of tickets without escalation.

Research assistants. Agents that search across multiple sources, cross-reference information, synthesize findings, and produce structured reports. Particularly useful in fields like legal research, market analysis, and academic literature review.

Personal assistants. Agents that manage calendars, draft emails, monitor information feeds, and handle routine administrative tasks. These typically integrate with dozens of tools — email, calendar, task managers, messaging platforms.

Data analysis. Agents that take a question about your data, write SQL or Python to extract and analyze it, generate visualizations, and explain the findings in plain language. They turn "what were our top-performing products last quarter?" into an actual analysis rather than advice on how to run one.

6. What It Takes to Build One

If you want to build an agent, here's what the stack looks like in practice:

The language model. This is your agent's brain. Claude, GPT-4, Gemini, or an open-source model like Llama. The model needs to be good at tool calling — not all models handle structured tool output well. For most agent use cases, you want a model that follows instructions precisely and produces valid JSON for tool calls.

A tool framework. Something to define tools, handle tool calls, and manage the agent loop. Options range from frameworks like LangChain and CrewAI to lighter approaches using the model's native function calling with your own orchestration code.

Tool definitions. Each tool needs a name, description, parameter schema, and implementation. The description is what the model reads to decide when and how to use the tool — write these like you're explaining the tool to a new teammate. If you're building with Claude, my skill design guide covers how to structure tool definitions for reliable agent behavior.

Memory infrastructure. At minimum, conversation history management. For more capable agents, add a vector store for long-term knowledge retrieval and a structured store for user preferences and state.

Error handling. This is where most agent projects struggle. Tools fail. APIs time out. The model misinterprets instructions. You need retry logic, fallback behaviors, and limits on how many loops the agent can take before it stops and asks for help.

Start small. Build an agent that does one thing well before trying to build one that does everything. A single-purpose agent with three well-designed tools will outperform a general-purpose agent with thirty poorly designed ones.

7. Limitations and Failure Modes

Agents are powerful, but they fail in predictable ways. Knowing these failure modes helps you design around them:

Hallucinated actions. Just as language models hallucinate facts, agents can hallucinate tool calls — invoking tools with nonsensical arguments or claiming to have taken actions they didn't actually take. Validation on tool inputs and outputs catches most of these.

Infinite loops. An agent that can't solve a problem might keep retrying the same failed approach indefinitely. Always set a maximum number of iterations and have the agent report back when it's stuck rather than spinning forever.

Context window overflow. Long-running agents accumulate tool outputs that fill up the context window. When the window overflows, earlier context gets dropped, and the agent loses track of what it has already done. Summarization strategies and selective context management help, but this remains a real constraint.

Cascading errors. One bad tool call can send the agent down an entirely wrong path. By the time the error compounds through several more steps, the agent is solving the wrong problem and doesn't realize it. Checkpointing and intermediate validation reduce this risk.

Cost. Every loop iteration costs tokens. An agent that takes 30 steps to complete a task might use 10-50x the tokens of a single chatbot response. For high-volume applications, this adds up fast.

8. Where Agents Are Going

Agent capabilities are improving on multiple fronts simultaneously:

Better models mean better planning. As language models get better at reasoning and following instructions, agents get better at everything — more accurate tool use, fewer wasted steps, better error recovery. Model improvements cascade through the entire agent system.

Standardized tool protocols. The Model Context Protocol (MCP), introduced by Anthropic, is standardizing how agents connect to tools and data sources. Instead of building custom integrations for every tool, agents can connect to MCP servers that expose tools in a standard format. This is making it much easier to give agents access to new capabilities.

Computer use. Agents are learning to use graphical interfaces — clicking buttons, filling forms, navigating websites just like a human would. This unlocks tools that don't have APIs, which is most software.

Multi-agent collaboration. Systems where specialized agents coordinate on complex tasks are getting more practical. Instead of one agent trying to be good at everything, teams of agents can divide work based on their strengths.

The trajectory is clear: agents are getting more capable, more reliable, and easier to build. The gap between "what an agent can do" and "what you'd trust an agent to do unsupervised" is closing — slowly, but closing.

Getting Started

If you want to start building agents yourself, here's a practical path:

Understand the tools. Read through how to write a CLAUDE.md file for agent configuration and the skill design guide for building reliable tool definitions.
Learn about knowledge retrieval. Agents that can pull from external knowledge are far more useful than agents limited to their training data. My RAG system guide covers building retrieval pipelines from scratch.
Build something small. Pick a repetitive task in your workflow and automate it with an agent. Keep the scope tight — one goal, three to five tools, clear success criteria.
Measure and iterate. Track how often the agent succeeds, where it fails, and how many steps it takes. Optimize the tools and prompts based on actual failure patterns, not intuition.

The technology is ready. The models are capable enough, the tooling is mature enough, and there are enough production examples to learn from. The bottleneck now isn't the AI — it's figuring out which problems are worth solving with agents and designing systems that handle the inevitable edge cases gracefully.

Want to see an agent in action? Try the Andy AI Chat — it uses tool calling and retrieval to answer questions about my projects and articles. Or read the CLAUDE.md writing guide to start configuring your own agent's behavior and memory.

— Andy