LLM Agents

A note on this module: The previous modules focused on building everything from scratch with NumPy. This final module takes a step back and looks at the applications layer — how LLMs are used as components in larger systems. The code here is more conceptual than runnable, since agents rely on external APIs and tools.

From LLM to Agent

An LLM by itself can only generate text. An agent can:

LLM	Agent
Text in, text out	Perceive → Think → Act
Single turn	Multi-turn with memory
Generates answers	Uses tools to find answers
No planning	Breaks tasks into steps

The key difference: agents interact with the real world through tools.

A Real Example

User: “What’s the weather in Tokyo right now?”

	Response
LLM alone	”I don’t have access to real-time weather data, but Tokyo typically has…”
Agent	Calls weather API → “It’s currently 22°C and sunny in Tokyo.”

The ReAct Pattern: Think, Then Act

The most influential agent pattern is ReAct (Reasoning + Acting). The LLM alternates between thinking about what to do and acting to gather information:

ReAct Agent: Thought → Action → Observation

Question: What is the population of the capital of France?

Click "Next Step" to start the agent reasoning

Try both examples — click “Next Step” to watch the agent’s reasoning unfold. Notice how each cycle has three parts:

Phase	What Happens	Example
Thought	LLM reasons about what it knows/needs	”I need to find the current population…”
Action	LLM calls a tool	`web_search("population of Paris 2024")`
Observation	Tool returns a result	”Paris population: 2.1 million (2024)“

🤔 Quick Check

What is the key advantage of the ReAct pattern over simply asking the LLM to answer directly?

def react_agent(question, tools, max_steps=5):
    history = f"Question: {question}\n"
    
    for step in range(max_steps):
        # LLM generates thought + action
        response = llm.generate(f"""
        {history}
        
        Think about what to do next, then take an action.
        Available tools: {[t.name for t in tools]}, finish(answer)
        
        Thought: """)
        
        thought, action = parse_response(response)
        history += f"Thought: {thought}\nAction: {action}\n"
        
        if action.startswith("finish"):
            return extract_answer(action)
        
        tool_name, args = parse_action(action)
        result = tools[tool_name].execute(args)
        history += f"Observation: {result}\n"
    
    return "Max steps reached"

Tool Use: Giving LLMs Superpowers

Common Tools

Tool	What It Does	Example
Web Search	Find current information	”Who won the 2024 election?”
Calculator	Precise math	”What’s 47.3% of $12,847?”
Code Execution	Run Python/JS	”Parse this CSV and find the average”
API Calls	Access services	”Check flight prices LAX→JFK”
File I/O	Read/write files	”Save this summary to report.md”

Function Calling

Modern LLMs support structured function calling — the model outputs a JSON object specifying which tool to use and with what arguments. The system executes the function and feeds the result back to the LLM.

Field	Purpose	Example
tool	Which tool to call	`"web_search"`
arguments	Parameters for the tool	`{"query": "population of Paris 2024"}`
thought	Why this tool is needed	`"I need current population data"`

Memory Systems

Humans don’t forget conversations. Neither should agents.

Memory Type	Duration	Implementation
Working memory	Current conversation	Chat history in context window
Short-term	Recent sessions	Summarized key facts
Long-term	Persistent	Vector database of past interactions

The Context Window Problem

LLMs have limited context windows (4K-128K tokens). An agent that runs for many steps can exceed this limit.

Solutions:

Summarization: Compress old conversation turns
Retrieval: Use embeddings to find relevant past context
Structured storage: Save facts to a database, retrieve as needed

Planning: Breaking Down Complex Tasks

Sophisticated agents plan before acting. For a request like “Write a blog post about transformer architectures,” an agent might:

Step	Action	Tools Used
1	Research current architectures	Web search
2	Create outline with key sections	LLM reasoning
3	Write each section	LLM generation
4	Add code examples	Code execution
5	Review and edit for clarity	LLM self-review
6	Format as markdown	File I/O

Each step might involve multiple ReAct cycles. The agent tracks progress and adjusts the plan if a step fails.

Agent Architectures

Architecture	Pattern	Best For
Single Agent	User → Agent → Tools → Response	Simple tasks, focused interactions
Multi-Agent	Coordinator dispatches to specialized agents (Researcher, Coder, Reviewer)	Complex tasks requiring different expertise
Hierarchical	Manager decomposes into subtasks → Worker agents	Large projects that can be decomposed

Common Challenges

Challenge	Problem	Solution
Hallucinated actions	LLM invents tools that don’t exist	Validate against available tool list
Infinite loops	Agent searches forever without progress	Step limits + loop detection
Error recovery	Tools can fail or return errors	Feed error back to LLM to adjust strategy

# Hallucinated actions
if action.tool not in available_tools:
    return "Invalid tool. Available: {list(available_tools)}"

# Infinite loops
if step_count > max_steps:
    return "Stopping: exceeded step limit"
if action in recent_actions[-3:]:
    return "Stopping: detected loop"

# Error recovery
try:
    result = tool.execute(args)
except Exception as e:
    result = f"Error: {e}. Try a different approach."

Current Frontier (2024-2025)

Agents are evolving rapidly:

Computer use: Agents that control mouse/keyboard to use any software
Browser agents: Navigate websites, fill forms, complete tasks
Coding agents: Write, test, debug, and deploy code autonomously
Research agents: Read papers, synthesize findings, generate hypotheses

Key Challenges

Reliability: Agents still make mistakes; critical tasks need human oversight
Cost: Each tool call = more API tokens = more cost
Safety: Agents with real-world actions need guardrails
Evaluation: How do you benchmark an agent that takes different paths each time?

The Vision

A world where you describe a task in natural language and an agent:

Plans the approach
Uses appropriate tools
Handles errors gracefully
Delivers the result
Explains what it did and why

We’re getting close, but reliability and safety remain active research areas.

Summary

Concept	Key Point
Agent = LLM + Tools	LLMs generate text; agents take actions
ReAct	Alternate Thought → Action → Observation
Tool Use	Search, compute, code, APIs
Memory	Working (context), short-term (summaries), long-term (databases)
Planning	Break complex tasks into manageable steps
Challenges	Hallucination, loops, error handling

🎉 Congratulations!

You’ve completed the entire Transformer 101 course — from RNNs to agents. You now understand:

How RNNs process sequences with memory
Why vanishing gradients motivated new architectures
How attention lets tokens communicate
The complete Transformer architecture
How BERT and GPT specialize the transformer for different tasks
How agents extend LLMs with tools and reasoning

The next step? Build something. Pick a project and implement what you’ve learned. The best way to solidify understanding is to create.