🎯 Mục tiêu bài học
Sau bài này, bạn sẽ:
✅ Hiểu Stacked RNN là gì và tại sao cần
✅ Biết cách xây dựng RNN nhiều layers với Keras
✅ Hiểu Bidirectional RNN
✅ Chọn số layers phù hợp
Ôn lại bài trước
Đã học RNN đơn giản (1 layer). Hôm nay học cách xếp nhiều layers để học features phức tạp hơn!
📚 Stacked RNN là gì?
Định nghĩa
Stacked RNN = Nhiều RNN layers xếp chồng lên nhau. Mỗi layer học được features ở mức trừu tượng cao hơn.
So sánh Single vs Stacked
So sánh Single Layer vs Stacked RNN
Tại sao cần Stack?
| Lý do | Giải thích |
|---|---|
| Feature Hierarchy | Layer 1: local patterns, Layer 2: global patterns |
| Capacity | Nhiều params → học được relationships phức tạp hơn |
| Representation | Mỗi layer tạo representation khác nhau |
Checkpoint
Bạn đã hiểu Stacked RNN là gì?
💻 Triển khai Stacked RNN
Code Keras
1from tensorflow import keras2from tensorflow.keras import layers34def create_stacked_rnn(vocab_size, embedding_dim, rnn_units, num_classes):5 """6 Stacked RNN với 3 layers7 """8 model = keras.Sequential([9 # Embedding10 layers.Embedding(vocab_size, embedding_dim),11 12 # RNN Layer 1 - return_sequences=True cho layer tiếp theo13 layers.SimpleRNN(rnn_units, return_sequences=True),14 layers.Dropout(0.3),15 16 # RNN Layer 217 layers.SimpleRNN(rnn_units // 2, return_sequences=True),18 layers.Dropout(0.3),19 20 # RNN Layer 3 - return_sequences=False cho classification21 layers.SimpleRNN(rnn_units // 4, return_sequences=False),22 layers.Dropout(0.3),23 24 # Output25 layers.Dense(num_classes, activation='softmax')26 ])27 28 return model2930# Create model31model = create_stacked_rnn(32 vocab_size=10000,33 embedding_dim=128,34 rnn_units=128,35 num_classes=536)3738model.compile(39 optimizer='adam',40 loss='sparse_categorical_crossentropy',41 metrics=['accuracy']42)4344model.summary()1Model: "sequential"2_________________________________________________________________3Layer (type) Output Shape Param # 4=================================================================5embedding (Embedding) (None, None, 128) 1280000 6_________________________________________________________________7simple_rnn (SimpleRNN) (None, None, 128) 32896 8_________________________________________________________________9dropout (Dropout) (None, None, 128) 0 10_________________________________________________________________11simple_rnn_1 (SimpleRNN) (None, None, 64) 12352 12_________________________________________________________________13dropout_1 (Dropout) (None, None, 64) 0 14_________________________________________________________________15simple_rnn_2 (SimpleRNN) (None, 32) 3104 16_________________________________________________________________17dropout_2 (Dropout) (None, 32) 0 18_________________________________________________________________19dense (Dense) (None, 5) 165 20=================================================================21Total params: 1,328,517Quan trọng: return_sequences
return_sequences=True cần thiết cho TẤT CẢ layers NGOẠI TRỪ layer cuối cùng!
1# ✅ Correct2layers.SimpleRNN(64, return_sequences=True) # Layer 13layers.SimpleRNN(32, return_sequences=True) # Layer 24layers.SimpleRNN(16, return_sequences=False) # Layer 3 (last)56# ❌ Wrong7layers.SimpleRNN(64, return_sequences=False) # Layer 1 - ERROR!8layers.SimpleRNN(32, return_sequences=True) # Layer 2Checkpoint
Bạn đã biết cách code Stacked RNN?
🔄 Residual Connections cho RNN
Ý tưởng
Tương tự ResNet, thêm skip connections để:
- Gradient flow tốt hơn
- Train RNN sâu hơn
- Tránh degradation
Code với Functional API
1from tensorflow.keras import layers, Model, Input23def create_residual_rnn(input_shape, rnn_units=64, num_classes=5):4 """5 RNN với Residual Connections6 """7 inputs = Input(shape=input_shape)8 9 # Initial projection10 x = layers.Dense(rnn_units)(inputs)11 12 # Residual Block 113 rnn_out = layers.SimpleRNN(rnn_units, return_sequences=True)(x)14 rnn_out = layers.LayerNormalization()(rnn_out)15 x = layers.Add()([x, rnn_out]) # Skip connection16 17 # Residual Block 218 rnn_out = layers.SimpleRNN(rnn_units, return_sequences=True)(x)19 rnn_out = layers.LayerNormalization()(rnn_out)20 x = layers.Add()([x, rnn_out]) # Skip connection21 22 # Residual Block 323 rnn_out = layers.SimpleRNN(rnn_units, return_sequences=False)(x)24 25 # Output26 x = layers.Dropout(0.5)(rnn_out)27 outputs = layers.Dense(num_classes, activation='softmax')(x)28 29 return Model(inputs, outputs)3031# Create32model = create_residual_rnn(33 input_shape=(100, 128), # (timesteps, features)34 rnn_units=64,35 num_classes=536)3738model.summary()Visualization
Checkpoint
Bạn đã hiểu Residual RNN?
📊 Layer Normalization
Tại sao cần Normalization?
| Vấn đề | Giải pháp |
|---|---|
| Internal covariate shift | Normalize activations |
| Training không ổn định | Stabilize gradient flow |
| Vanishing/Exploding | Control activation magnitudes |
Layer Norm vs Batch Norm
1from tensorflow.keras import layers23# Batch Normalization: normalize across batch4# - Phụ thuộc vào batch size5# - Khó dùng cho sequences (độ dài khác nhau)6bn = layers.BatchNormalization()78# Layer Normalization: normalize across features9# - Độc lập với batch size10# - Tốt cho RNN11ln = layers.LayerNormalization()1213# Ví dụ14import tensorflow as tf15x = tf.random.normal((32, 10, 64)) # (batch, timesteps, features)1617bn_out = bn(x)18ln_out = ln(x)1920print(f"Input shape: {x.shape}")21print(f"BatchNorm output: {bn_out.shape}")22print(f"LayerNorm output: {ln_out.shape}")Sử dụng trong RNN
1def rnn_block_with_norm(x, units, return_sequences=True):2 """RNN block với Layer Normalization"""3 # RNN4 rnn_out = layers.SimpleRNN(units, return_sequences=return_sequences)(x)5 6 # Layer Normalization7 if return_sequences:8 normalized = layers.LayerNormalization()(rnn_out)9 else:10 normalized = layers.LayerNormalization()(rnn_out)11 12 # Activation13 activated = layers.Activation('relu')(normalized)14 15 return activated1617# Usage in model18inputs = Input(shape=(100, 128))19x = rnn_block_with_norm(inputs, 64, return_sequences=True)20x = rnn_block_with_norm(x, 32, return_sequences=True)21x = rnn_block_with_norm(x, 16, return_sequences=False)22outputs = layers.Dense(5, activation='softmax')(x)2324model = Model(inputs, outputs)Checkpoint
Bạn đã hiểu Layer Normalization?
⚙️ Training Deep RNN
Best Practices
| Technique | Mục đích | Keras |
|---|---|---|
| Gradient Clipping | Prevent exploding | clipnorm=1.0 |
| Layer Normalization | Stabilize training | LayerNormalization() |
| Residual Connections | Better gradient flow | Add() |
| Dropout | Regularization | Dropout(0.3) |
| Learning Rate Schedule | Better convergence | ReduceLROnPlateau |
Complete Training Pipeline
1from tensorflow import keras2from tensorflow.keras import layers3from tensorflow.keras.optimizers import Adam4from tensorflow.keras.callbacks import (5 EarlyStopping, 6 ReduceLROnPlateau,7 ModelCheckpoint8)910def create_deep_rnn(vocab_size, embedding_dim, max_len, num_classes):11 """Production-ready Deep RNN"""12 13 inputs = keras.Input(shape=(max_len,))14 15 # Embedding16 x = layers.Embedding(vocab_size, embedding_dim)(inputs)17 18 # Stacked RNN with normalization and dropout19 for units in [128, 64, 32]:20 x = layers.SimpleRNN(units, return_sequences=True)(x)21 x = layers.LayerNormalization()(x)22 x = layers.Dropout(0.3)(x)23 24 # Final RNN layer25 x = layers.SimpleRNN(16)(x)26 x = layers.LayerNormalization()(x)27 x = layers.Dropout(0.5)(x)28 29 # Output30 outputs = layers.Dense(num_classes, activation='softmax')(x)31 32 return keras.Model(inputs, outputs)3334# Create model35model = create_deep_rnn(36 vocab_size=10000,37 embedding_dim=128,38 max_len=200,39 num_classes=540)4142# Optimizer với gradient clipping43optimizer = Adam(44 learning_rate=0.001,45 clipnorm=1.0 # Clip gradients46)4748# Compile49model.compile(50 optimizer=optimizer,51 loss='sparse_categorical_crossentropy',52 metrics=['accuracy']53)5455# Callbacks56callbacks = [57 EarlyStopping(58 monitor='val_loss',59 patience=5,60 restore_best_weights=True61 ),62 ReduceLROnPlateau(63 monitor='val_loss',64 factor=0.5,65 patience=3,66 min_lr=1e-767 ),68 ModelCheckpoint(69 'best_model.keras',70 monitor='val_accuracy',71 save_best_only=True72 )73]7475# Train76# history = model.fit(77# X_train, y_train,78# epochs=50,79# batch_size=64,80# validation_split=0.2,81# callbacks=callbacks82# )Checkpoint
Bạn đã nắm được cách train Deep RNN?
📈 Monitoring và Debugging
Các metrics cần theo dõi
1import matplotlib.pyplot as plt23def plot_training(history):4 """Plot training metrics"""5 fig, axes = plt.subplots(2, 2, figsize=(12, 10))6 7 # Loss8 axes[0, 0].plot(history.history['loss'], label='Train')9 axes[0, 0].plot(history.history['val_loss'], label='Val')10 axes[0, 0].set_title('Loss')11 axes[0, 0].legend()12 13 # Accuracy14 axes[0, 1].plot(history.history['accuracy'], label='Train')15 axes[0, 1].plot(history.history['val_accuracy'], label='Val')16 axes[0, 1].set_title('Accuracy')17 axes[0, 1].legend()18 19 # Learning rate (if using ReduceLROnPlateau)20 if 'lr' in history.history:21 axes[1, 0].plot(history.history['lr'])22 axes[1, 0].set_title('Learning Rate')23 axes[1, 0].set_yscale('log')24 25 # Gap (overfitting indicator)26 train_acc = history.history['accuracy']27 val_acc = history.history['val_accuracy']28 gap = [t - v for t, v in zip(train_acc, val_acc)]29 axes[1, 1].plot(gap)30 axes[1, 1].axhline(y=0, color='r', linestyle='--')31 axes[1, 1].set_title('Train-Val Gap (Overfitting indicator)')32 33 plt.tight_layout()34 plt.show()3536# Usage37# plot_training(history)Debug tips
Các vấn đề thường gặp và cách fix:
-
Loss = NaN
- Gradient exploding → Giảm LR, thêm gradient clipping
-
Val loss tăng sớm
- Overfitting → Thêm dropout, regularization
-
Training chậm
- Reduce sequence length, batch size
-
Accuracy không tăng
- Model too simple → Thêm layers/units
- LR quá nhỏ → Tăng LR
Checkpoint
Bạn đã biết cách monitor training?
🎯 Tổng kết Module RNN
Kiến thức đã học
| Lesson | Nội dung chính |
|---|---|
| 10 | RNN basics, Hidden State, BPTT |
| 11 | Text preprocessing, Applications |
| 12 | Stacked RNN, Residual, Normalization |
RNN Components
1# Basic RNN2layers.SimpleRNN(units, return_sequences=True/False)34# Bidirectional5layers.Bidirectional(layers.SimpleRNN(units))67# Stacked (multiple layers)8layers.SimpleRNN(64, return_sequences=True) # All except last9layers.SimpleRNN(32, return_sequences=False) # Last layer1011# With normalization12layers.LayerNormalization()Hạn chế của SimpleRNN
| Vấn đề | Impact |
|---|---|
| Vanishing Gradient | Không học được long-term dependencies |
| Sequential Processing | Chậm, không parallel được |
| Short Memory | Quên thông tin từ quá khứ xa |
Module tiếp theo
LSTM (Long Short-Term Memory):
- Memory Cells: Lưu trữ thông tin dài hạn
- Gates: Kiểm soát information flow
- Giải quyết vanishing gradient problem
GRU (Gated Recurrent Unit):
- Simplified LSTM
- Ít parameters hơn
- Performance tương đương
🎉 Hoàn thành Module RNN! Bạn đã sẵn sàng học LSTM - giải pháp cho long-term dependencies.
