LLM Agents
A note on this module: The previous modules focused on building everything from scratch with NumPy. This final module takes a step back and looks at the applications layer — how LLMs are used as components in larger systems. The code here is more conceptual than runnable, since agents rely on external APIs and tools.
From LLM to Agent
An LLM by itself can only generate text. An agent can:
| LLM | Agent |
|---|---|
| Text in, text out | Perceive → Think → Act |
| Single turn | Multi-turn with memory |
| Generates answers | Uses tools to find answers |
| No planning | Breaks tasks into steps |
The key difference: agents interact with the real world through tools.
A Real Example
User: “What’s the weather in Tokyo right now?”
| Response | |
|---|---|
| LLM alone | ”I don’t have access to real-time weather data, but Tokyo typically has…” |
| Agent | Calls weather API → “It’s currently 22°C and sunny in Tokyo.” |
The ReAct Pattern: Think, Then Act
The most influential agent pattern is ReAct (Reasoning + Acting). The LLM alternates between thinking about what to do and acting to gather information:
ReAct Agent: Thought → Action → Observation
Try both examples — click “Next Step” to watch the agent’s reasoning unfold. Notice how each cycle has three parts:
| Phase | What Happens | Example |
|---|---|---|
| Thought | LLM reasons about what it knows/needs | ”I need to find the current population…” |
| Action | LLM calls a tool | web_search("population of Paris 2024") |
| Observation | Tool returns a result | ”Paris population: 2.1 million (2024)“ |
def react_agent(question, tools, max_steps=5):
history = f"Question: {question}\n"
for step in range(max_steps):
# LLM generates thought + action
response = llm.generate(f"""
{history}
Think about what to do next, then take an action.
Available tools: {[t.name for t in tools]}, finish(answer)
Thought: """)
thought, action = parse_response(response)
history += f"Thought: {thought}\nAction: {action}\n"
if action.startswith("finish"):
return extract_answer(action)
tool_name, args = parse_action(action)
result = tools[tool_name].execute(args)
history += f"Observation: {result}\n"
return "Max steps reached"Tool Use: Giving LLMs Superpowers
Common Tools
| Tool | What It Does | Example |
|---|---|---|
| Web Search | Find current information | ”Who won the 2024 election?” |
| Calculator | Precise math | ”What’s 47.3% of $12,847?” |
| Code Execution | Run Python/JS | ”Parse this CSV and find the average” |
| API Calls | Access services | ”Check flight prices LAX→JFK” |
| File I/O | Read/write files | ”Save this summary to report.md” |
Function Calling
Modern LLMs support structured function calling — the model outputs a JSON object specifying which tool to use and with what arguments. The system executes the function and feeds the result back to the LLM.
| Field | Purpose | Example |
|---|---|---|
| tool | Which tool to call | "web_search" |
| arguments | Parameters for the tool | {"query": "population of Paris 2024"} |
| thought | Why this tool is needed | "I need current population data" |
Memory Systems
Humans don’t forget conversations. Neither should agents.
| Memory Type | Duration | Implementation |
|---|---|---|
| Working memory | Current conversation | Chat history in context window |
| Short-term | Recent sessions | Summarized key facts |
| Long-term | Persistent | Vector database of past interactions |
The Context Window Problem
LLMs have limited context windows (4K-128K tokens). An agent that runs for many steps can exceed this limit.
Solutions:
- Summarization: Compress old conversation turns
- Retrieval: Use embeddings to find relevant past context
- Structured storage: Save facts to a database, retrieve as needed
Planning: Breaking Down Complex Tasks
Sophisticated agents plan before acting. For a request like “Write a blog post about transformer architectures,” an agent might:
| Step | Action | Tools Used |
|---|---|---|
| 1 | Research current architectures | Web search |
| 2 | Create outline with key sections | LLM reasoning |
| 3 | Write each section | LLM generation |
| 4 | Add code examples | Code execution |
| 5 | Review and edit for clarity | LLM self-review |
| 6 | Format as markdown | File I/O |
Each step might involve multiple ReAct cycles. The agent tracks progress and adjusts the plan if a step fails.
Agent Architectures
| Architecture | Pattern | Best For |
|---|---|---|
| Single Agent | User → Agent → Tools → Response | Simple tasks, focused interactions |
| Multi-Agent | Coordinator dispatches to specialized agents (Researcher, Coder, Reviewer) | Complex tasks requiring different expertise |
| Hierarchical | Manager decomposes into subtasks → Worker agents | Large projects that can be decomposed |
Common Challenges
| Challenge | Problem | Solution |
|---|---|---|
| Hallucinated actions | LLM invents tools that don’t exist | Validate against available tool list |
| Infinite loops | Agent searches forever without progress | Step limits + loop detection |
| Error recovery | Tools can fail or return errors | Feed error back to LLM to adjust strategy |
# Hallucinated actions
if action.tool not in available_tools:
return "Invalid tool. Available: {list(available_tools)}"
# Infinite loops
if step_count > max_steps:
return "Stopping: exceeded step limit"
if action in recent_actions[-3:]:
return "Stopping: detected loop"
# Error recovery
try:
result = tool.execute(args)
except Exception as e:
result = f"Error: {e}. Try a different approach."Current Frontier (2024-2025)
Agents are evolving rapidly:
- Computer use: Agents that control mouse/keyboard to use any software
- Browser agents: Navigate websites, fill forms, complete tasks
- Coding agents: Write, test, debug, and deploy code autonomously
- Research agents: Read papers, synthesize findings, generate hypotheses
Key Challenges
- Reliability: Agents still make mistakes; critical tasks need human oversight
- Cost: Each tool call = more API tokens = more cost
- Safety: Agents with real-world actions need guardrails
- Evaluation: How do you benchmark an agent that takes different paths each time?
The Vision
A world where you describe a task in natural language and an agent:
- Plans the approach
- Uses appropriate tools
- Handles errors gracefully
- Delivers the result
- Explains what it did and why
We’re getting close, but reliability and safety remain active research areas.
Summary
| Concept | Key Point |
|---|---|
| Agent = LLM + Tools | LLMs generate text; agents take actions |
| ReAct | Alternate Thought → Action → Observation |
| Tool Use | Search, compute, code, APIs |
| Memory | Working (context), short-term (summaries), long-term (databases) |
| Planning | Break complex tasks into manageable steps |
| Challenges | Hallucination, loops, error handling |
🎉 Congratulations!
You’ve completed the entire Transformer 101 course — from RNNs to agents. You now understand:
- How RNNs process sequences with memory
- Why vanishing gradients motivated new architectures
- How attention lets tokens communicate
- The complete Transformer architecture
- How BERT and GPT specialize the transformer for different tasks
- How agents extend LLMs with tools and reasoning
The next step? Build something. Pick a project and implement what you’ve learned. The best way to solidify understanding is to create.