Introduction to RNNs
The Problem: Why Sequences Are Hard
You’ve used this feature hundreds of times: typing on your phone, and it predicts what you’ll say next.
Type “I’m going to the” and your phone suggests: gym, store, beach…
But how does it know? The key insight: context matters. The prediction for “I’m going to the ___” depends entirely on what came before. If you had typed “I’m running to the ___”, the suggestions would shift toward finish line, bathroom, car.
This is the fundamental challenge with sequential data — each piece of information depends on what came before it.
The Big Idea: A Network with Memory
Imagine reading a mystery novel. You don’t forget chapter 1 when you reach chapter 5 — you carry information forward. Clues from early chapters help you understand later events.
Recurrent Neural Networks (RNNs) work the same way. They’re neural networks with a “memory” that gets updated as they process each piece of a sequence.
Here’s the simple loop:
Watch how the memory grows as each word is processed. The predictions shift based on accumulated context!
Why Context Changes Everything
Traditional neural networks treat each input independently — they have no memory. That’s like trying to understand a sentence by looking at each word in isolation. RNNs solve this by passing information forward through the sequence.
How It Works: The Core Loop
Let’s break down what happens at each step. Don’t worry about the math yet — just the intuition:
At each timestep:
new_memory = combine(current_input, old_memory)
prediction = transform(new_memory)
That’s it! The network learns how to combine and transform through training.
In the Forward Pass, we’ll unpack the real equations — including the weight matrices, the tanh activation, and softmax — and trace through the math step by step.
Key insight: At each step, the hidden state depends on two things:
- The current input (the word just read)
- The previous hidden state (everything seen before)
This is the recurrent connection — information flows forward through time.
Ready to see the real equations and watch the computation unfold? Head to the Forward Pass →
Deep Understanding Check
The Vanishing Gradient Problem
RNNs struggle with very long sequences. Why? During backpropagation, gradients get multiplied many times. If those multipliers are less than 1, the gradient “vanishes” — becoming too small to learn from. This makes it hard for basic RNNs to learn long-range dependencies.
Solutions:
- LSTM (Long Short-Term Memory) — adds “gates” to control information flow
- GRU (Gated Recurrent Unit) — a simpler gated architecture
- Attention mechanisms — let the network “look back” at any previous step