Introduction to RNNs

The Problem: Why Sequences Are Hard

You’ve used this feature hundreds of times: typing on your phone, and it predicts what you’ll say next.

Type “I’m going to the” and your phone suggests: gym, store, beach

But how does it know? The key insight: context matters. The prediction for “I’m going to the ___” depends entirely on what came before. If you had typed “I’m running to the ___”, the suggestions would shift toward finish line, bathroom, car.

This is the fundamental challenge with sequential data — each piece of information depends on what came before it.

🤔 Quick Check
Which of these tasks benefits MOST from remembering previous information?

The Big Idea: A Network with Memory

Imagine reading a mystery novel. You don’t forget chapter 1 when you reach chapter 5 — you carry information forward. Clues from early chapters help you understand later events.

Recurrent Neural Networks (RNNs) work the same way. They’re neural networks with a “memory” that gets updated as they process each piece of a sequence.

Here’s the simple loop:

🧠 RNN Processing Sequence
Ilovemy ___
I
dog
42%
family
35%
job
25%
cat
13%
Step 1 / 3

Watch how the memory grows as each word is processed. The predictions shift based on accumulated context!

Why Context Changes Everything

💡 Why Context Matters
Input: "bank"
🚫 Without Context
Financial institution? River edge?
With Context
'river bank' vs 'bank account' — context tells you which!
Input: "not"
🚫 Without Context
A negative word?
With Context
'I'm not hungry' (negation) vs 'tie a knot' (different word!)

Traditional neural networks treat each input independently — they have no memory. That’s like trying to understand a sentence by looking at each word in isolation. RNNs solve this by passing information forward through the sequence.

✍️ Fill in the Blanks
RNNs maintain a state that captures information from previous .

How It Works: The Core Loop

Let’s break down what happens at each step. Don’t worry about the math yet — just the intuition:

At each timestep:

new_memory = combine(current_input, old_memory)
prediction = transform(new_memory)

That’s it! The network learns how to combine and transform through training.

In the Forward Pass, we’ll unpack the real equations — including the weight matrices, the tanh activation, and softmax — and trace through the math step by step.

Key insight: At each step, the hidden state hth_t depends on two things:

  1. The current input xtx_t (the word just read)
  2. The previous hidden state ht1h_{t-1} (everything seen before)

This is the recurrent connection — information flows forward through time.

Ready to see the real equations and watch the computation unfold? Head to the Forward Pass →

Deep Understanding Check

🤔 Quick Check
Imagine you're building an autocomplete system. In your own words, explain why the prediction after 'I love' would be different from the prediction after 'I hate', even though both are asking for the next word.

The Vanishing Gradient Problem

RNNs struggle with very long sequences. Why? During backpropagation, gradients get multiplied many times. If those multipliers are less than 1, the gradient “vanishes” — becoming too small to learn from. This makes it hard for basic RNNs to learn long-range dependencies.

Solutions:

  • LSTM (Long Short-Term Memory) — adds “gates” to control information flow
  • GRU (Gated Recurrent Unit) — a simpler gated architecture
  • Attention mechanisms — let the network “look back” at any previous step