Giới thiệu RNN - Recurrent Neural Networks

🎯 Mục tiêu bài học

TB5 min

Sau bài này, bạn sẽ:

✅ Hiểu RNN là gì và tại sao cần RNN

✅ Hiểu Hidden State (bộ nhớ) hoạt động thế nào

✅ Biết vấn đề Vanishing Gradient

✅ Xây dựng RNN đơn giản với Keras

Ôn lại Module trước

CNN (Convolutional Neural Network):

Xử lý ảnh (dữ liệu 2D)
Tìm patterns không gian

RNN (bài này):

Xử lý chuỗi (dữ liệu tuần tự)
Nhớ thông tin từ quá khứ

Analogy: CNN giống như nhìn một bức ảnh, RNN giống như đọc một câu chuyện - phải đọc từ đầu đến cuối, nhớ những gì đã đọc!

Task 0

📖 Bảng Thuật Ngữ RNN

TB5 min

Thuật ngữ	Tiếng Việt	Giải thích
RNN	Mạng nơ-ron hồi quy	Mạng có khả năng nhớ thông tin từ các bước trước
Sequence	Chuỗi/Dãy	Dữ liệu có thứ tự theo thời gian
Time Step	Bước thời gian	Một phần tử trong chuỗi
Hidden State	Trạng thái ẩn	Bộ nhớ của RNN tại mỗi bước
Recurrence	Tính hồi quy	Kết nối từ output quay về input
BPTT	Backprop Through Time	Backpropagation cho sequences
Vanishing Gradient	Gradient biến mất	Vấn đề khi train RNN sâu

Checkpoint

Bạn đã đọc qua bảng thuật ngữ?

Task 1

🔄 RNN là gì?

TB5 min

Vấn đề với dữ liệu tuần tự

Feedforward networks (ANN, CNN) xử lý mỗi input độc lập. Nhưng nhiều dữ liệu có thứ tự quan trọng:

"Tôi yêu Việt Nam" ≠ "Việt Nam yêu tôi"
Giá cổ phiếu hôm nay phụ thuộc hôm qua
Video = chuỗi các frames

Định nghĩa RNN

Recurrent Neural Network (RNN) là mạng neural có khả năng:

Xử lý dữ liệu tuần tự (sequential data)
Nhớ thông tin từ các bước trước
Chia sẻ weights qua các time steps

So sánh với Feedforward Network

Feedforward Network vs RNN

Ý tưởng chính

Checkpoint

Bạn đã hiểu RNN là gì?

Task 2

📐 Toán học của RNN

TB5 min

Công thức RNN cơ bản

Tại mỗi time step $t$ :

$h_t = \tanh(W_{xh} \cdot x_t + W_{hh} \cdot h_{t-1} + b_h)$ $y_t = W_{hy} \cdot h_t + b_y$

Trong đó:

$x_t$ : Input tại time step $t$
$h_t$ : Hidden state tại $t$
$h_{t-1}$ : Hidden state từ bước trước
$y_t$ : Output tại $t$
$W_{xh}$ : Weights từ input → hidden
$W_{hh}$ : Weights từ hidden → hidden (recurrent)
$W_{hy}$ : Weights từ hidden → output

Ví dụ tính toán

python.py

1import numpy as np
2
3# Kích thước
4input_size = 3
5hidden_size = 4
6output_size = 2
7
8# Initialize weights
9np.random.seed(42)
10Wxh = np.random.randn(hidden_size, input_size) * 0.1
11Whh = np.random.randn(hidden_size, hidden_size) * 0.1
12Why = np.random.randn(output_size, hidden_size) * 0.1
13bh = np.zeros((hidden_size, 1))
14by = np.zeros((output_size, 1))
15
16def rnn_step(x, h_prev):
17    """
18    Single RNN step
19    x: (input_size, 1)
20    h_prev: (hidden_size, 1)
21    """
22    # New hidden state
23    h = np.tanh(Wxh @ x + Whh @ h_prev + bh)
24    # Output
25    y = Why @ h + by
26    return h, y
27
28# Input sequence (3 time steps)
29X = np.array([
30    [[1], [0], [1]],   # x1
31    [[0], [1], [0]],   # x2
32    [[1], [1], [0]]    # x3
33])
34
35# Process sequence
36h = np.zeros((hidden_size, 1))  # Initial hidden state
37outputs = []
38
39for t in range(len(X)):
40    h, y = rnn_step(X[t], h)
41    outputs.append(y)
42    print(f"t={t}: h shape={h.shape}, y shape={y.shape}")

Expected Output

1t=0: h shape=(4, 1), y shape=(2, 1)
2t=1: h shape=(4, 1), y shape=(2, 1)
3t=2: h shape=(4, 1), y shape=(2, 1)

Unrolled RNN

Weight Sharing: RNN sử dụng cùng một bộ weights cho mọi time step. Điều này cho phép xử lý sequences có độ dài bất kỳ.

Checkpoint

Bạn đã hiểu công thức RNN?

Task 3

📊 Các loại RNN Architecture

TB5 min

Phân loại theo Input-Output

Các loại RNN Architecture

Ví dụ ứng dụng

Architecture	Input	Output	Ứng dụng
Many-to-One	Câu văn	Label	Sentiment Analysis
One-to-Many	Ảnh	Câu	Image Captioning
Many-to-Many	Tiếng Anh	Tiếng Việt	Translation
Many-to-Many	Sequence	Sequence	Video tagging

Checkpoint

Bạn đã biết các loại RNN architecture?

Task 4

💻 RNN trong Keras

TB5 min

SimpleRNN Layer

python.py

1from tensorflow import keras
2from tensorflow.keras import layers
3
4# Simple RNN layer
5model = keras.Sequential([
6    # Input shape: (timesteps, features)
7    layers.SimpleRNN(
8        units=64,              # Hidden size
9        return_sequences=True,  # Return all time steps
10        input_shape=(10, 32)   # 10 timesteps, 32 features
11    ),
12    layers.SimpleRNN(
13        units=32,
14        return_sequences=False  # Only return last output
15    ),
16    layers.Dense(10, activation='softmax')
17])
18
19model.summary()

Tham số quan trọng

Tham số	Ý nghĩa	Giá trị
`units`	Số hidden units	32, 64, 128, ...
`return_sequences`	Trả về tất cả outputs?	True/False
`return_state`	Trả về final state?	True/False
`activation`	Activation function	'tanh' (default)

return_sequences

python.py

1# return_sequences=True: Output tại MỌI time step
2rnn_all = layers.SimpleRNN(64, return_sequences=True)
3# Input: (batch, 10, 32) → Output: (batch, 10, 64)
4
5# return_sequences=False: Chỉ output TIME STEP CUỐI
6rnn_last = layers.SimpleRNN(64, return_sequences=False)
7# Input: (batch, 10, 32) → Output: (batch, 64)
8
9# Ví dụ
10x = keras.Input(shape=(10, 32))
11y_all = rnn_all(x)
12y_last = rnn_last(x)
13
14print(f"return_sequences=True: {y_all.shape}")
15print(f"return_sequences=False: {y_last.shape}")

Expected Output

1return_sequences=True: (None, 10, 64)
2return_sequences=False: (None, 64)

Khi nào dùng return_sequences=True?

Khi muốn stack nhiều RNN layers
Khi output cần ở mỗi time step (sequence-to-sequence)

Khi nào dùng return_sequences=False?

Khi chỉ cần output cuối cùng (classification, regression)

Checkpoint

Bạn đã biết cách dùng SimpleRNN trong Keras?

Task 5

📝 Ví dụ: Sentiment Analysis

TB5 min

Phân loại sentiment (Many-to-One)

python.py

1from tensorflow import keras
2from tensorflow.keras import layers
3
4# Hyperparameters
5VOCAB_SIZE = 10000
6MAX_LEN = 200
7EMBEDDING_DIM = 128
8HIDDEN_DIM = 64
9
10def create_sentiment_model():
11    """RNN for sentiment classification"""
12    model = keras.Sequential([
13        # Embedding layer: words → vectors
14        layers.Embedding(
15            input_dim=VOCAB_SIZE,
16            output_dim=EMBEDDING_DIM,
17            input_length=MAX_LEN
18        ),
19        
20        # RNN layer
21        layers.SimpleRNN(
22            units=HIDDEN_DIM,
23            return_sequences=False  # Only last output
24        ),
25        
26        # Classification head
27        layers.Dropout(0.5),
28        layers.Dense(1, activation='sigmoid')  # Binary classification
29    ])
30    
31    return model
32
33model = create_sentiment_model()
34model.compile(
35    optimizer='adam',
36    loss='binary_crossentropy',
37    metrics=['accuracy']
38)
39model.summary()

Training với IMDB dataset

python.py

1from tensorflow.keras.datasets import imdb
2from tensorflow.keras.preprocessing.sequence import pad_sequences
3
4# Load data
5(x_train, y_train), (x_test, y_test) = imdb.load_data(
6    num_words=VOCAB_SIZE
7)
8
9# Pad sequences to same length
10x_train = pad_sequences(x_train, maxlen=MAX_LEN)
11x_test = pad_sequences(x_test, maxlen=MAX_LEN)
12
13print(f"Training data shape: {x_train.shape}")
14print(f"Test data shape: {x_test.shape}")
15
16# Train
17history = model.fit(
18    x_train, y_train,
19    epochs=5,
20    batch_size=128,
21    validation_split=0.2,
22    callbacks=[
23        keras.callbacks.EarlyStopping(patience=2)
24    ]
25)

Checkpoint

Bạn đã hiểu ví dụ Sentiment Analysis?

Task 6

⚠️ Vấn đề Vanishing/Exploding Gradients

TB5 min

Vanishing Gradient Problem

Khi train RNN dài (nhiều time steps), gradient có thể:

Vanish: Gradient → 0, không học được
Explode: Gradient → ∞, training không ổn định

Tại sao xảy ra?

Minh họa bằng code

python.py

1import numpy as np
2import matplotlib.pyplot as plt
3
4def simulate_gradient_flow(time_steps, gradient_factor):
5    """
6    Simulate gradient flow in RNN
7    gradient_factor < 1: vanishing
8    gradient_factor > 1: exploding
9    """
10    gradients = [1.0]  # Initial gradient
11    for t in range(time_steps):
12        gradients.append(gradients[-1] * gradient_factor)
13    return gradients
14
15# Simulate
16steps = 50
17vanishing = simulate_gradient_flow(steps, 0.9)
18stable = simulate_gradient_flow(steps, 1.0)
19exploding = simulate_gradient_flow(steps, 1.1)
20
21# Plot
22plt.figure(figsize=(10, 4))
23plt.semilogy(vanishing, label='Vanishing (0.9)', linestyle='--')
24plt.semilogy(stable, label='Stable (1.0)')
25plt.semilogy(exploding, label='Exploding (1.1)', linestyle='-.')
26plt.xlabel('Time Steps')
27plt.ylabel('Gradient Magnitude (log scale)')
28plt.title('Gradient Flow in RNN')
29plt.legend()
30plt.grid(True)
31plt.show()

Giải pháp

Vấn đề	Giải pháp
Vanishing	LSTM, GRU (bài sau)
Exploding	Gradient Clipping
Cả hai	Careful initialization, Normalization

Gradient Clipping

python.py

1# Gradient clipping trong Keras
2optimizer = keras.optimizers.Adam(
3    learning_rate=0.001,
4    clipnorm=1.0,      # Clip by norm
5    # hoặc
6    # clipvalue=0.5,   # Clip by value
7)
8
9model.compile(
10    optimizer=optimizer,
11    loss='binary_crossentropy',
12    metrics=['accuracy']
13)

Checkpoint

Bạn đã hiểu vấn đề Vanishing Gradient?

Task 7

🎯 Tổng kết

TB5 min

Những điểm quan trọng

RNN = Neural network với memory (hidden state)
Recurrence: Output của bước trước → Input của bước sau
Weight Sharing: Cùng weights cho mọi time steps
Công thức: $h_t = \tanh(W_{xh} x_t + W_{hh} h_{t-1} + b)$
Vấn đề: Vanishing/Exploding gradients với sequences dài

So sánh với CNN

Aspect	CNN	RNN
Data	Grid (images)	Sequential (text, time)
Memory	Không	Có (hidden state)
Weight sharing	Spatial	Temporal
Parallelization	Dễ	Khó (sequential)

SimpleRNN trong Keras

Python

1layers.SimpleRNN(
2    units=64,              # Hidden size
3    return_sequences=True, # All outputs or last only
4    input_shape=(T, F)     # (timesteps, features)
5)

Bài tiếp theo

Chúng ta sẽ học:

LSTM: Giải quyết vanishing gradient
GRU: Phiên bản đơn giản của LSTM
Bidirectional RNN: Xử lý cả hai chiều

🎉 Tuyệt vời! Bạn đã hiểu cơ bản về RNN! Bài tiếp theo sẽ học về các biến thể mạnh mẽ hơn.

Task 8