Transformer 101 - Do I have your full attention?

An interactive guide to understand the ML architecture that powers modern AI.

Build intuition from RNNs to Transformers with visualizations and Python implementations.

GitHub𝕏Website

Unlike most transformer guides, every concept here comes with interactive visualizations you can explore and Python implementations you can run yourself. No PyTorch, no TensorFlow — just NumPy and understanding.

Start Learning →

What You’ll Learn

This course builds your intuition from the ground up:

ModuleTopicKey Concepts
1RNNsSequential processing, backprop through time, vanishing gradients
2-3FoundationsEmbeddings, tokenization
4Seq2Seq + AttentionEncoder-decoder, attention mechanism
5-8Transformer ComponentsSelf-attention, multi-head, positional encoding, layer norm
9Full TransformerPutting it all together
10-12ApplicationsBERT, GPT, Agents

Prerequisites

You should be comfortable with:

  • Python — loops, functions, classes
  • Basic calculus — derivatives, chain rule
  • Linear algebra — vectors, matrices, dot products

Don’t worry if you’re rusty—we’ll review the math as we go.

How to Use This Site

Each module includes:

  1. Explanations — Clear prose with math when needed
  2. Visualizations — Interactive diagrams to build intuition
  3. Code — Python implementations you can run

Recommended approach:

  1. Read the explanation first
  2. Play with the visualization
  3. Study the code
  4. Try modifying the code to test your understanding

Philosophy

“What I cannot create, I do not understand.” — Richard Feynman

We implement everything from scratch. No PyTorch. No TensorFlow. Just Python and understanding.

This isn’t the fastest way to use transformers, but it’s the best way to understand them.

Credits

Inspired by:

Built with Astro, Svelte, and KaTeX. Development assisted by Claude, Cursor, and OpenCode.