About | Transformer 101

Transformer 101 - Do I have your full attention?

An interactive guide to understand the ML architecture that powers modern AI.

Build intuition from RNNs to Transformers with visualizations and Python implementations.

Unlike most transformer guides, every concept here comes with interactive visualizations you can explore and Python implementations you can run yourself. No PyTorch, no TensorFlow — just NumPy and understanding.

Start Learning →

What You’ll Learn

This course builds your intuition from the ground up:

Module	Topic	Key Concepts
1	RNNs	Sequential processing, backprop through time, vanishing gradients
2-3	Foundations	Embeddings, tokenization
4	Seq2Seq + Attention	Encoder-decoder, attention mechanism
5-8	Transformer Components	Self-attention, multi-head, positional encoding, layer norm
9	Full Transformer	Putting it all together
10-12	Applications	BERT, GPT, Agents

Prerequisites

You should be comfortable with:

Python — loops, functions, classes
Basic calculus — derivatives, chain rule
Linear algebra — vectors, matrices, dot products

Don’t worry if you’re rusty—we’ll review the math as we go.

How to Use This Site

Each module includes:

Explanations — Clear prose with math when needed
Visualizations — Interactive diagrams to build intuition
Code — Python implementations you can run

Recommended approach:

Read the explanation first
Play with the visualization
Study the code
Try modifying the code to test your understanding

Philosophy

“What I cannot create, I do not understand.” — Richard Feynman

We implement everything from scratch. No PyTorch. No TensorFlow. Just Python and understanding.

This isn’t the fastest way to use transformers, but it’s the best way to understand them.

Credits

Inspired by:

Built with Astro, Svelte, and KaTeX. Development assisted by Claude, Cursor, and OpenCode.