Introduction to RNNs

The Problem: Why Sequences Are Hard

You’ve used this feature hundreds of times: typing on your phone, and it predicts what you’ll say next.

Type “I’m going to the” and your phone suggests: gym, store, beach…

But how does it know? The key insight: context matters. The prediction for “I’m going to the ___” depends entirely on what came before. If you had typed “I’m running to the ___”, the suggestions would shift toward finish line, bathroom, car.

This is the fundamental challenge with sequential data — each piece of information depends on what came before it.

🤔 Quick Check

Which of these tasks benefits MOST from remembering previous information?

The Big Idea: A Network with Memory

Imagine reading a mystery novel. You don’t forget chapter 1 when you reach chapter 5 — you carry information forward. Clues from early chapters help you understand later events.

Recurrent Neural Networks (RNNs) work the same way. They’re neural networks with a “memory” that gets updated as they process each piece of a sequence.

Here’s the simple loop:

🧠 RNN Processing Sequence

Input Sequence

Ilovemy ___

↓

Hidden State (Memory)

↓

Next Word Predictions

dog

42%

family

35%

job

25%

cat

13%

Step 1 / 3

Watch how the memory grows as each word is processed. The predictions shift based on accumulated context!

Why Context Changes Everything

💡 Why Context Matters

Input: "bank"

🚫 Without Context

Financial institution? River edge?

→

✓ With Context

'river bank' vs 'bank account' — context tells you which!

Input: "not"

🚫 Without Context

A negative word?

→

✓ With Context

'I'm not hungry' (negation) vs 'tie a knot' (different word!)

Traditional neural networks treat each input independently — they have no memory. That’s like trying to understand a sentence by looking at each word in isolation. RNNs solve this by passing information forward through the sequence.

✍️ Fill in the Blanks

RNNs maintain a state that captures information from previous .

How It Works: The Core Loop

Let’s break down what happens at each step. Don’t worry about the math yet — just the intuition:

At each timestep:

new_memory = combine(current_input, old_memory)
prediction = transform(new_memory)

That’s it! The network learns how to combine and transform through training.

In the Forward Pass, we’ll unpack the real equations — including the weight matrices, the tanh activation, and softmax — and trace through the math step by step.

Key insight: At each step, the hidden state $h_t$ depends on two things:

The current input $x_t$ (the word just read)
The previous hidden state $h_{t-1}$ (everything seen before)

This is the recurrent connection — information flows forward through time.

Ready to see the real equations and watch the computation unfold? Head to the Forward Pass →

Deep Understanding Check

🤔 Quick Check

Imagine you're building an autocomplete system. In your own words, explain why the prediction after 'I love' would be different from the prediction after 'I hate', even though both are asking for the next word.

The Vanishing Gradient Problem

RNNs struggle with very long sequences. Why? During backpropagation, gradients get multiplied many times. If those multipliers are less than 1, the gradient “vanishes” — becoming too small to learn from. This makes it hard for basic RNNs to learn long-range dependencies.

Solutions:

LSTM (Long Short-Term Memory) — adds “gates” to control information flow
GRU (Gated Recurrent Unit) — a simpler gated architecture
Attention mechanisms — let the network “look back” at any previous step