LLM Agents

A note on this module: The previous modules focused on building everything from scratch with NumPy. This final module takes a step back and looks at the applications layer — how LLMs are used as components in larger systems. The code here is more conceptual than runnable, since agents rely on external APIs and tools.

From LLM to Agent

An LLM by itself can only generate text. An agent can:

LLMAgent
Text in, text outPerceive → Think → Act
Single turnMulti-turn with memory
Generates answersUses tools to find answers
No planningBreaks tasks into steps

The key difference: agents interact with the real world through tools.

A Real Example

User: “What’s the weather in Tokyo right now?”

Response
LLM alone”I don’t have access to real-time weather data, but Tokyo typically has…”
AgentCalls weather API → “It’s currently 22°C and sunny in Tokyo.”

The ReAct Pattern: Think, Then Act

The most influential agent pattern is ReAct (Reasoning + Acting). The LLM alternates between thinking about what to do and acting to gather information:

ReAct Agent: Thought → Action → Observation

Question: What is the population of the capital of France?
Click "Next Step" to start the agent reasoning

Try both examples — click “Next Step” to watch the agent’s reasoning unfold. Notice how each cycle has three parts:

PhaseWhat HappensExample
ThoughtLLM reasons about what it knows/needs”I need to find the current population…”
ActionLLM calls a toolweb_search("population of Paris 2024")
ObservationTool returns a result”Paris population: 2.1 million (2024)“
🤔 Quick Check
What is the key advantage of the ReAct pattern over simply asking the LLM to answer directly?
def react_agent(question, tools, max_steps=5):
    history = f"Question: {question}\n"
    
    for step in range(max_steps):
        # LLM generates thought + action
        response = llm.generate(f"""
        {history}
        
        Think about what to do next, then take an action.
        Available tools: {[t.name for t in tools]}, finish(answer)
        
        Thought: """)
        
        thought, action = parse_response(response)
        history += f"Thought: {thought}\nAction: {action}\n"
        
        if action.startswith("finish"):
            return extract_answer(action)
        
        tool_name, args = parse_action(action)
        result = tools[tool_name].execute(args)
        history += f"Observation: {result}\n"
    
    return "Max steps reached"

Tool Use: Giving LLMs Superpowers

Common Tools

ToolWhat It DoesExample
Web SearchFind current information”Who won the 2024 election?”
CalculatorPrecise math”What’s 47.3% of $12,847?”
Code ExecutionRun Python/JS”Parse this CSV and find the average”
API CallsAccess services”Check flight prices LAX→JFK”
File I/ORead/write files”Save this summary to report.md”

Function Calling

Modern LLMs support structured function calling — the model outputs a JSON object specifying which tool to use and with what arguments. The system executes the function and feeds the result back to the LLM.

FieldPurposeExample
toolWhich tool to call"web_search"
argumentsParameters for the tool{"query": "population of Paris 2024"}
thoughtWhy this tool is needed"I need current population data"

Memory Systems

Humans don’t forget conversations. Neither should agents.

Memory TypeDurationImplementation
Working memoryCurrent conversationChat history in context window
Short-termRecent sessionsSummarized key facts
Long-termPersistentVector database of past interactions

The Context Window Problem

LLMs have limited context windows (4K-128K tokens). An agent that runs for many steps can exceed this limit.

Solutions:

  • Summarization: Compress old conversation turns
  • Retrieval: Use embeddings to find relevant past context
  • Structured storage: Save facts to a database, retrieve as needed

Planning: Breaking Down Complex Tasks

Sophisticated agents plan before acting. For a request like “Write a blog post about transformer architectures,” an agent might:

StepActionTools Used
1Research current architecturesWeb search
2Create outline with key sectionsLLM reasoning
3Write each sectionLLM generation
4Add code examplesCode execution
5Review and edit for clarityLLM self-review
6Format as markdownFile I/O

Each step might involve multiple ReAct cycles. The agent tracks progress and adjusts the plan if a step fails.


Agent Architectures

ArchitecturePatternBest For
Single AgentUser → Agent → Tools → ResponseSimple tasks, focused interactions
Multi-AgentCoordinator dispatches to specialized agents (Researcher, Coder, Reviewer)Complex tasks requiring different expertise
HierarchicalManager decomposes into subtasks → Worker agentsLarge projects that can be decomposed

Common Challenges

ChallengeProblemSolution
Hallucinated actionsLLM invents tools that don’t existValidate against available tool list
Infinite loopsAgent searches forever without progressStep limits + loop detection
Error recoveryTools can fail or return errorsFeed error back to LLM to adjust strategy
# Hallucinated actions
if action.tool not in available_tools:
    return "Invalid tool. Available: {list(available_tools)}"

# Infinite loops
if step_count > max_steps:
    return "Stopping: exceeded step limit"
if action in recent_actions[-3:]:
    return "Stopping: detected loop"

# Error recovery
try:
    result = tool.execute(args)
except Exception as e:
    result = f"Error: {e}. Try a different approach."

Current Frontier (2024-2025)

Agents are evolving rapidly:

  • Computer use: Agents that control mouse/keyboard to use any software
  • Browser agents: Navigate websites, fill forms, complete tasks
  • Coding agents: Write, test, debug, and deploy code autonomously
  • Research agents: Read papers, synthesize findings, generate hypotheses

Key Challenges

  • Reliability: Agents still make mistakes; critical tasks need human oversight
  • Cost: Each tool call = more API tokens = more cost
  • Safety: Agents with real-world actions need guardrails
  • Evaluation: How do you benchmark an agent that takes different paths each time?

The Vision

A world where you describe a task in natural language and an agent:

  1. Plans the approach
  2. Uses appropriate tools
  3. Handles errors gracefully
  4. Delivers the result
  5. Explains what it did and why

We’re getting close, but reliability and safety remain active research areas.


Summary

ConceptKey Point
Agent = LLM + ToolsLLMs generate text; agents take actions
ReActAlternate Thought → Action → Observation
Tool UseSearch, compute, code, APIs
MemoryWorking (context), short-term (summaries), long-term (databases)
PlanningBreak complex tasks into manageable steps
ChallengesHallucination, loops, error handling

🎉 Congratulations!

You’ve completed the entire Transformer 101 course — from RNNs to agents. You now understand:

  • How RNNs process sequences with memory
  • Why vanishing gradients motivated new architectures
  • How attention lets tokens communicate
  • The complete Transformer architecture
  • How BERT and GPT specialize the transformer for different tasks
  • How agents extend LLMs with tools and reasoning

The next step? Build something. Pick a project and implement what you’ve learned. The best way to solidify understanding is to create.