
Unlike most transformer guides, every concept here comes with interactive visualizations you can explore and Python implementations you can run yourself. No PyTorch, no TensorFlow — just NumPy and understanding.
Start Learning →What You’ll Learn
This course builds your intuition from the ground up:
| Module | Topic | Key Concepts |
|---|---|---|
| 1 | RNNs | Sequential processing, backprop through time, vanishing gradients |
| 2-3 | Foundations | Embeddings, tokenization |
| 4 | Seq2Seq + Attention | Encoder-decoder, attention mechanism |
| 5-8 | Transformer Components | Self-attention, multi-head, positional encoding, layer norm |
| 9 | Full Transformer | Putting it all together |
| 10-12 | Applications | BERT, GPT, Agents |
Prerequisites
You should be comfortable with:
- Python — loops, functions, classes
- Basic calculus — derivatives, chain rule
- Linear algebra — vectors, matrices, dot products
Don’t worry if you’re rusty—we’ll review the math as we go.
How to Use This Site
Each module includes:
- Explanations — Clear prose with math when needed
- Visualizations — Interactive diagrams to build intuition
- Code — Python implementations you can run
Recommended approach:
- Read the explanation first
- Play with the visualization
- Study the code
- Try modifying the code to test your understanding
Philosophy
“What I cannot create, I do not understand.” — Richard Feynman
We implement everything from scratch. No PyTorch. No TensorFlow. Just Python and understanding.
This isn’t the fastest way to use transformers, but it’s the best way to understand them.
Credits
Inspired by:
- The Illustrated Transformer by Jay Alammar
- Andrej Karpathy’s Neural Networks: Zero to Hero
- 3Blue1Brown’s Neural Networks series
Built with Astro, Svelte, and KaTeX. Development assisted by Claude, Cursor, and OpenCode.