How to Build an AI Agent (2026)
An AI agent is software that uses an LLM to plan, make decisions, and take actions to achieve a goal. Unlike a chatbot that just responds, an agent actually does things — browses the web, writes files, calls APIs, and iterates until the task is done. Here's how to build one.
What Makes an Agent
An AI agent has four components:
1. LLM (Brain) — decides what to do
2. Tools (Hands) — takes actions in the world
3. Memory (Context) — remembers what happened
4. Loop (Persistence) — keeps going until the goal is met
The Agent Loop
Goal received
↓
LLM decides next action
↓
Tool executes the action
↓
Result returned to LLM
↓
LLM evaluates: goal met?
↓
No → decide next action (loop back)
Yes → return final result
This loop is the fundamental difference between a chatbot (one response) and an agent (continuous action until done).
Level 1: Simple Agent (30 minutes)
The ReAct Pattern
The simplest agent pattern: Reason, Act, Observe.
Architecture:
- Give the LLM a system prompt with available tools
- LLM reasons about what to do
- LLM chooses a tool and provides arguments
- Your code executes the tool
- Result is fed back to the LLM
- Repeat until the LLM says "done"
Example: A Research Agent
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "web_search",
"description": "Search the web for information",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
},
{
"name": "save_note",
"description": "Save a research finding",
"input_schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"content": {"type": "string"}
},
"required": ["title", "content"]
}
}
]
def run_agent(goal):
messages = [{"role": "user", "content": goal}]
while True:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a research agent. Use tools to research topics and save findings.",
tools=tools,
messages=messages
)
# Check if the agent wants to use a tool
if response.stop_reason == "tool_use":
# Execute the tool
tool_results = execute_tools(response.content)
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
else:
# Agent is done
return response.content
def execute_tools(content):
results = []
for block in content:
if block.type == "tool_use":
if block.name == "web_search":
result = search_web(block.input["query"])
elif block.name == "save_note":
result = save_note(block.input["title"], block.input["content"])
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
return results
That's a working agent. It receives a goal, decides which tools to use, executes them, evaluates results, and continues until done.
Level 2: Agent with Memory (1-2 hours)
Adding Persistent Memory
Level 1 agents forget everything between runs. Adding memory makes agents that learn and remember.
Types of memory:
- Short-term (conversation): The current message history. Already present in Level 1.
- Long-term (persistent): Saved to a file or database between runs. Agent remembers past interactions.
- Working memory (scratchpad): Notes the agent writes to itself during a task.
Implementation:
import json
class AgentMemory:
def __init__(self, path="memory.json"):
self.path = path
self.load()
def load(self):
try:
with open(self.path) as f:
self.data = json.load(f)
except FileNotFoundError:
self.data = {"facts": [], "preferences": [], "history": []}
def save(self):
with open(self.path, "w") as f:
json.dump(self.data, f, indent=2)
def add_fact(self, fact):
self.data["facts"].append(fact)
self.save()
def get_context(self):
return f"Known facts: {json.dumps(self.data['facts'][-20:])}"
Add memory.get_context() to your system prompt so the agent knows what it remembers.
Level 3: Multi-Tool Agent (2-4 hours)
Adding Real Tools
Real agents need real tools. Common tools to implement:
tools_registry = {
"web_search": search_brave_api, # Search the web
"read_url": fetch_and_parse_url, # Read a webpage
"write_file": write_to_disk, # Save files
"read_file": read_from_disk, # Read files
"run_code": execute_python, # Run Python code
"send_email": send_via_api, # Send emails
"query_database": run_sql_query, # Query a database
"call_api": make_http_request, # Call any API
}
Security consideration: Every tool is a potential attack surface. An agent with run_code can execute arbitrary code. An agent with send_email can email anyone. Implement guardrails:
def run_code(code):
# Sandbox: run in Docker container with no network
# Timeout: kill after 30 seconds
# Review: log every execution
pass
def send_email(to, subject, body):
# Allowlist: only send to approved domains
# Rate limit: max 5 emails per hour
# Log: record every email sent
pass
Level 4: Using a Framework (Fastest)
LangChain
The most popular agent framework. Handles the agent loop, tool management, and memory.
from langchain.agents import create_react_agent
from langchain_anthropic import ChatAnthropic
from langchain.tools import Tool
llm = ChatAnthropic(model="claude-sonnet-4-20250514")
tools = [
Tool(name="search", func=search_web, description="Search the web"),
Tool(name="calculator", func=calculate, description="Do math"),
]
agent = create_react_agent(llm, tools, prompt)
result = agent.invoke({"input": "Research the market size for AI tools"})
Pros: Fast to build, large ecosystem, many pre-built tools. Cons: Abstraction can be opaque, debugging is harder, dependency heavy.
CrewAI
Multi-agent framework — multiple specialized agents working together.
from crewai import Agent, Task, Crew
researcher = Agent(
role="Research Analyst",
goal="Research market data and trends",
tools=[search_tool, web_reader],
llm=claude
)
writer = Agent(
role="Report Writer",
goal="Write clear, actionable reports",
tools=[file_writer],
llm=claude
)
research_task = Task(
description="Research the AI tools market size and growth",
agent=researcher
)
report_task = Task(
description="Write a market report from the research findings",
agent=writer
)
crew = Crew(agents=[researcher, writer], tasks=[research_task, report_task])
result = crew.kickoff()
Best for: Complex workflows where different "experts" handle different parts.
Vercel AI SDK
Build AI agents in TypeScript/Next.js.
import { generateText, tool } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
const result = await generateText({
model: anthropic('claude-sonnet-4-20250514'),
tools: {
search: tool({
description: 'Search the web',
parameters: z.object({ query: z.string() }),
execute: async ({ query }) => searchWeb(query),
}),
},
maxSteps: 10,
prompt: 'Research the top AI coding tools and summarize findings',
});
Best for: TypeScript/Next.js developers building web-based agents.
Architecture Decisions
Which LLM?
| Model | Best For | Cost |
|---|---|---|
| Claude Sonnet | Best reasoning, tool use | $3/$15 per 1M tokens |
| GPT-4o | Strong all-around | $2.50/$10 per 1M tokens |
| Claude Haiku | Fast, cheap tasks | $0.25/$1.25 per 1M tokens |
| GPT-4o Mini | Budget tasks | $0.15/$0.60 per 1M tokens |
| Local (Llama 3.1) | Privacy, no API cost | Free (hardware cost) |
Recommendation: Claude Sonnet for complex agents. GPT-4o Mini or Haiku for simple, high-volume tasks.
Framework vs Custom?
| Approach | When to Use |
|---|---|
| Custom (no framework) | Simple agents, learning, full control |
| LangChain | Rapid prototyping, large tool ecosystem |
| CrewAI | Multi-agent workflows |
| Vercel AI SDK | TypeScript web applications |
| Semantic Kernel | .NET/C# applications |
Start custom. Build your first agent from scratch to understand the fundamentals. Use a framework for your second agent when you understand what the framework is abstracting.
Common Agent Patterns
Tool Selection Agent
Agent decides which tool to use based on the task. The most basic pattern.
Pipeline Agent
Agent executes steps in a fixed order: research → analyze → write → review. Each step uses different tools.
Supervisor Agent
One agent coordinates multiple sub-agents. Assigns tasks, collects results, synthesizes output. Used in CrewAI and multi-agent systems.
Reflection Agent
Agent generates output, then critiques its own output, then improves it. Produces higher quality results at the cost of more LLM calls.
Common Mistakes
- No error handling. Tools fail. APIs timeout. Files don't exist. Always handle errors gracefully and let the agent retry or adapt.
- Infinite loops. Set a maximum number of steps (10-20). If the agent hasn't completed the task, stop and report what happened.
- Too many tools. Each tool adds complexity and confusion for the LLM. Start with 3-5 tools. Add more only when needed.
- No logging. Log every LLM call, tool execution, and decision. When something goes wrong, you need the trace.
- Overly broad goals. "Improve our marketing" is too vague. "Research 5 competitors' pricing pages and create a comparison table" is specific enough for an agent.
FAQ
How much does it cost to run an agent?
A typical agent task (10-20 LLM calls with tool use) costs $0.10-0.50 with Claude Sonnet. Complex tasks with many iterations: $1-5. Budget $50-200/month for moderate agent usage.
Can agents run unsupervised?
For low-risk tasks (research, file organization, data processing): yes, with proper guardrails. For high-risk tasks (sending emails, modifying production data, spending money): always require human approval.
Which language should I build agents in?
Python (most resources, LangChain ecosystem) or TypeScript (web integration, Vercel AI SDK). Both are well-supported by all LLM providers.
How do I debug agents?
Log everything. Print the LLM's reasoning at each step. Record tool inputs and outputs. When the agent goes wrong, read the trace to find where its reasoning diverged.
Can I build agents with local models?
Yes. Llama 3.1 70B handles tool use well. Smaller models (8B) struggle with complex multi-step reasoning. Use Ollama or vLLM to serve local models with an OpenAI-compatible API.
Bottom Line
Building an AI agent is simpler than it sounds. The core pattern — LLM decides, tool executes, loop until done — can be implemented in 50 lines of code.
Start here: Build a simple research agent (Level 1) using Claude's tool use API. Give it a web search tool and a note-saving tool. Have it research a topic and save findings. That's your first agent.
Scale from there: Add memory (Level 2), more tools (Level 3), and frameworks (Level 4) as your needs grow. The fundamentals don't change — everything is built on the same decide-act-observe loop.