Stacked và Deep RNN | MinAI Learning

🎯 Mục tiêu bài học

TB5 min

Sau bài này, bạn sẽ:

✅ Hiểu Stacked RNN là gì và tại sao cần

✅ Biết cách xây dựng RNN nhiều layers với Keras

✅ Hiểu Bidirectional RNN

✅ Chọn số layers phù hợp

Ôn lại bài trước

Đã học RNN đơn giản (1 layer). Hôm nay học cách xếp nhiều layers để học features phức tạp hơn!

Task 0

📚 Stacked RNN là gì?

TB5 min

Định nghĩa

Stacked RNN = Nhiều RNN layers xếp chồng lên nhau. Mỗi layer học được features ở mức trừu tượng cao hơn.

So sánh Single vs Stacked

So sánh Single Layer vs Stacked RNN

Tại sao cần Stack?

Lý do	Giải thích
Feature Hierarchy	Layer 1: local patterns, Layer 2: global patterns
Capacity	Nhiều params → học được relationships phức tạp hơn
Representation	Mỗi layer tạo representation khác nhau

Checkpoint

Bạn đã hiểu Stacked RNN là gì?

Task 1

💻 Triển khai Stacked RNN

TB5 min

Code Keras

python.py

1from tensorflow import keras
2from tensorflow.keras import layers
3
4def create_stacked_rnn(vocab_size, embedding_dim, rnn_units, num_classes):
5    """
6    Stacked RNN với 3 layers
7    """
8    model = keras.Sequential([
9        # Embedding
10        layers.Embedding(vocab_size, embedding_dim),
11        
12        # RNN Layer 1 - return_sequences=True cho layer tiếp theo
13        layers.SimpleRNN(rnn_units, return_sequences=True),
14        layers.Dropout(0.3),
15        
16        # RNN Layer 2
17        layers.SimpleRNN(rnn_units // 2, return_sequences=True),
18        layers.Dropout(0.3),
19        
20        # RNN Layer 3 - return_sequences=False cho classification
21        layers.SimpleRNN(rnn_units // 4, return_sequences=False),
22        layers.Dropout(0.3),
23        
24        # Output
25        layers.Dense(num_classes, activation='softmax')
26    ])
27    
28    return model
29
30# Create model
31model = create_stacked_rnn(
32    vocab_size=10000,
33    embedding_dim=128,
34    rnn_units=128,
35    num_classes=5
36)
37
38model.compile(
39    optimizer='adam',
40    loss='sparse_categorical_crossentropy',
41    metrics=['accuracy']
42)
43
44model.summary()

Expected Output

1Model: "sequential"
2_________________________________________________________________
3Layer (type)                 Output Shape              Param #   
4=================================================================
5embedding (Embedding)        (None, None, 128)         1280000   
6_________________________________________________________________
7simple_rnn (SimpleRNN)       (None, None, 128)         32896     
8_________________________________________________________________
9dropout (Dropout)            (None, None, 128)         0         
10_________________________________________________________________
11simple_rnn_1 (SimpleRNN)     (None, None, 64)          12352     
12_________________________________________________________________
13dropout_1 (Dropout)          (None, None, 64)          0         
14_________________________________________________________________
15simple_rnn_2 (SimpleRNN)     (None, 32)                3104      
16_________________________________________________________________
17dropout_2 (Dropout)          (None, 32)                0         
18_________________________________________________________________
19dense (Dense)                (None, 5)                 165       
20=================================================================
21Total params: 1,328,517

Quan trọng: return_sequences

return_sequences=True cần thiết cho TẤT CẢ layers NGOẠI TRỪ layer cuối cùng!

Python

1# ✅ Correct
2layers.SimpleRNN(64, return_sequences=True)   # Layer 1
3layers.SimpleRNN(32, return_sequences=True)   # Layer 2
4layers.SimpleRNN(16, return_sequences=False)  # Layer 3 (last)
5
6# ❌ Wrong
7layers.SimpleRNN(64, return_sequences=False)  # Layer 1 - ERROR!
8layers.SimpleRNN(32, return_sequences=True)   # Layer 2

Checkpoint

Bạn đã biết cách code Stacked RNN?

Task 2

🔄 Residual Connections cho RNN

TB5 min

Ý tưởng

Tương tự ResNet, thêm skip connections để:

Gradient flow tốt hơn
Train RNN sâu hơn
Tránh degradation

Code với Functional API

python.py

1from tensorflow.keras import layers, Model, Input
2
3def create_residual_rnn(input_shape, rnn_units=64, num_classes=5):
4    """
5    RNN với Residual Connections
6    """
7    inputs = Input(shape=input_shape)
8    
9    # Initial projection
10    x = layers.Dense(rnn_units)(inputs)
11    
12    # Residual Block 1
13    rnn_out = layers.SimpleRNN(rnn_units, return_sequences=True)(x)
14    rnn_out = layers.LayerNormalization()(rnn_out)
15    x = layers.Add()([x, rnn_out])  # Skip connection
16    
17    # Residual Block 2
18    rnn_out = layers.SimpleRNN(rnn_units, return_sequences=True)(x)
19    rnn_out = layers.LayerNormalization()(rnn_out)
20    x = layers.Add()([x, rnn_out])  # Skip connection
21    
22    # Residual Block 3
23    rnn_out = layers.SimpleRNN(rnn_units, return_sequences=False)(x)
24    
25    # Output
26    x = layers.Dropout(0.5)(rnn_out)
27    outputs = layers.Dense(num_classes, activation='softmax')(x)
28    
29    return Model(inputs, outputs)
30
31# Create
32model = create_residual_rnn(
33    input_shape=(100, 128),  # (timesteps, features)
34    rnn_units=64,
35    num_classes=5
36)
37
38model.summary()

Visualization

Checkpoint

Bạn đã hiểu Residual RNN?

Task 3

📊 Layer Normalization

TB5 min

Tại sao cần Normalization?

Vấn đề	Giải pháp
Internal covariate shift	Normalize activations
Training không ổn định	Stabilize gradient flow
Vanishing/Exploding	Control activation magnitudes

Layer Norm vs Batch Norm

python.py

1from tensorflow.keras import layers
2
3# Batch Normalization: normalize across batch
4# - Phụ thuộc vào batch size
5# - Khó dùng cho sequences (độ dài khác nhau)
6bn = layers.BatchNormalization()
7
8# Layer Normalization: normalize across features
9# - Độc lập với batch size
10# - Tốt cho RNN
11ln = layers.LayerNormalization()
12
13# Ví dụ
14import tensorflow as tf
15x = tf.random.normal((32, 10, 64))  # (batch, timesteps, features)
16
17bn_out = bn(x)
18ln_out = ln(x)
19
20print(f"Input shape: {x.shape}")
21print(f"BatchNorm output: {bn_out.shape}")
22print(f"LayerNorm output: {ln_out.shape}")

Sử dụng trong RNN

python.py

1def rnn_block_with_norm(x, units, return_sequences=True):
2    """RNN block với Layer Normalization"""
3    # RNN
4    rnn_out = layers.SimpleRNN(units, return_sequences=return_sequences)(x)
5    
6    # Layer Normalization
7    if return_sequences:
8        normalized = layers.LayerNormalization()(rnn_out)
9    else:
10        normalized = layers.LayerNormalization()(rnn_out)
11    
12    # Activation
13    activated = layers.Activation('relu')(normalized)
14    
15    return activated
16
17# Usage in model
18inputs = Input(shape=(100, 128))
19x = rnn_block_with_norm(inputs, 64, return_sequences=True)
20x = rnn_block_with_norm(x, 32, return_sequences=True)
21x = rnn_block_with_norm(x, 16, return_sequences=False)
22outputs = layers.Dense(5, activation='softmax')(x)
23
24model = Model(inputs, outputs)

Checkpoint

Bạn đã hiểu Layer Normalization?

Task 4

⚙️ Training Deep RNN

TB5 min

Best Practices

Technique	Mục đích	Keras
Gradient Clipping	Prevent exploding	`clipnorm=1.0`
Layer Normalization	Stabilize training	`LayerNormalization()`
Residual Connections	Better gradient flow	`Add()`
Dropout	Regularization	`Dropout(0.3)`
Learning Rate Schedule	Better convergence	`ReduceLROnPlateau`

Complete Training Pipeline

python.py

1from tensorflow import keras
2from tensorflow.keras import layers
3from tensorflow.keras.optimizers import Adam
4from tensorflow.keras.callbacks import (
5    EarlyStopping, 
6    ReduceLROnPlateau,
7    ModelCheckpoint
8)
9
10def create_deep_rnn(vocab_size, embedding_dim, max_len, num_classes):
11    """Production-ready Deep RNN"""
12    
13    inputs = keras.Input(shape=(max_len,))
14    
15    # Embedding
16    x = layers.Embedding(vocab_size, embedding_dim)(inputs)
17    
18    # Stacked RNN with normalization and dropout
19    for units in [128, 64, 32]:
20        x = layers.SimpleRNN(units, return_sequences=True)(x)
21        x = layers.LayerNormalization()(x)
22        x = layers.Dropout(0.3)(x)
23    
24    # Final RNN layer
25    x = layers.SimpleRNN(16)(x)
26    x = layers.LayerNormalization()(x)
27    x = layers.Dropout(0.5)(x)
28    
29    # Output
30    outputs = layers.Dense(num_classes, activation='softmax')(x)
31    
32    return keras.Model(inputs, outputs)
33
34# Create model
35model = create_deep_rnn(
36    vocab_size=10000,
37    embedding_dim=128,
38    max_len=200,
39    num_classes=5
40)
41
42# Optimizer với gradient clipping
43optimizer = Adam(
44    learning_rate=0.001,
45    clipnorm=1.0  # Clip gradients
46)
47
48# Compile
49model.compile(
50    optimizer=optimizer,
51    loss='sparse_categorical_crossentropy',
52    metrics=['accuracy']
53)
54
55# Callbacks
56callbacks = [
57    EarlyStopping(
58        monitor='val_loss',
59        patience=5,
60        restore_best_weights=True
61    ),
62    ReduceLROnPlateau(
63        monitor='val_loss',
64        factor=0.5,
65        patience=3,
66        min_lr=1e-7
67    ),
68    ModelCheckpoint(
69        'best_model.keras',
70        monitor='val_accuracy',
71        save_best_only=True
72    )
73]
74
75# Train
76# history = model.fit(
77#     X_train, y_train,
78#     epochs=50,
79#     batch_size=64,
80#     validation_split=0.2,
81#     callbacks=callbacks
82# )

Checkpoint

Bạn đã nắm được cách train Deep RNN?

Task 5

📈 Monitoring và Debugging

TB5 min

Các metrics cần theo dõi

python.py

1import matplotlib.pyplot as plt
2
3def plot_training(history):
4    """Plot training metrics"""
5    fig, axes = plt.subplots(2, 2, figsize=(12, 10))
6    
7    # Loss
8    axes[0, 0].plot(history.history['loss'], label='Train')
9    axes[0, 0].plot(history.history['val_loss'], label='Val')
10    axes[0, 0].set_title('Loss')
11    axes[0, 0].legend()
12    
13    # Accuracy
14    axes[0, 1].plot(history.history['accuracy'], label='Train')
15    axes[0, 1].plot(history.history['val_accuracy'], label='Val')
16    axes[0, 1].set_title('Accuracy')
17    axes[0, 1].legend()
18    
19    # Learning rate (if using ReduceLROnPlateau)
20    if 'lr' in history.history:
21        axes[1, 0].plot(history.history['lr'])
22        axes[1, 0].set_title('Learning Rate')
23        axes[1, 0].set_yscale('log')
24    
25    # Gap (overfitting indicator)
26    train_acc = history.history['accuracy']
27    val_acc = history.history['val_accuracy']
28    gap = [t - v for t, v in zip(train_acc, val_acc)]
29    axes[1, 1].plot(gap)
30    axes[1, 1].axhline(y=0, color='r', linestyle='--')
31    axes[1, 1].set_title('Train-Val Gap (Overfitting indicator)')
32    
33    plt.tight_layout()
34    plt.show()
35
36# Usage
37# plot_training(history)

Debug tips

Các vấn đề thường gặp và cách fix:

Loss = NaN
- Gradient exploding → Giảm LR, thêm gradient clipping
Val loss tăng sớm
- Overfitting → Thêm dropout, regularization
Training chậm
- Reduce sequence length, batch size
Accuracy không tăng
- Model too simple → Thêm layers/units
- LR quá nhỏ → Tăng LR

Checkpoint

Bạn đã biết cách monitor training?

Task 6

🎯 Tổng kết Module RNN

TB5 min

Kiến thức đã học

Lesson	Nội dung chính
10	RNN basics, Hidden State, BPTT
11	Text preprocessing, Applications
12	Stacked RNN, Residual, Normalization

RNN Components

Python

1# Basic RNN
2layers.SimpleRNN(units, return_sequences=True/False)
3
4# Bidirectional
5layers.Bidirectional(layers.SimpleRNN(units))
6
7# Stacked (multiple layers)
8layers.SimpleRNN(64, return_sequences=True)  # All except last
9layers.SimpleRNN(32, return_sequences=False) # Last layer
10
11# With normalization
12layers.LayerNormalization()

Hạn chế của SimpleRNN

Vấn đề	Impact
Vanishing Gradient	Không học được long-term dependencies
Sequential Processing	Chậm, không parallel được
Short Memory	Quên thông tin từ quá khứ xa

Module tiếp theo

LSTM (Long Short-Term Memory):

Memory Cells: Lưu trữ thông tin dài hạn
Gates: Kiểm soát information flow
Giải quyết vanishing gradient problem

GRU (Gated Recurrent Unit):

Simplified LSTM
Ít parameters hơn
Performance tương đương

🎉 Hoàn thành Module RNN! Bạn đã sẵn sàng học LSTM - giải pháp cho long-term dependencies.

Task 7