🔄 Recurrent Neural Networks (RNN)
RNN được thiết kế cho sequential data như text, time series, audio. Bài này cover RNN cơ bản đến LSTM và GRU.
Tại sao cần RNN?
Sequential Data đặc biệt
Sequential Data
- Text: "I love this movie" - thứ tự từ quan trọng
- Time series: Stock prices - phụ thuộc vào past values
- Audio: Speech - temporal patterns
- Video: Sequences of frames
Feedforward Networks không đủ
- Không có "memory" về previous inputs
- Input size cố định
- Không capture temporal dependencies
RNN Architecture
Basic RNN Cell
Diagram
graph LR
X1[x₁] --> H1[h₁]
H1 --> H2[h₂]
X2[x₂] --> H2
H2 --> H3[h₃]
X3[x₃] --> H3
H3 --> Y[Output]
Implementation
Python
1import torch2import torch.nn as nn34class SimpleRNN(nn.Module):5 def __init__(self, input_size, hidden_size, output_size):6 super(SimpleRNN, self).__init__()7 self.hidden_size = hidden_size8 9 # RNN weights10 self.W_xh = nn.Linear(input_size, hidden_size)11 self.W_hh = nn.Linear(hidden_size, hidden_size)12 self.W_hy = nn.Linear(hidden_size, output_size)13 14 def forward(self, x, hidden=None):15 batch_size, seq_len, _ = x.shape16 17 if hidden is None:18 hidden = torch.zeros(batch_size, self.hidden_size)19 20 outputs = []21 for t in range(seq_len):22 hidden = torch.tanh(23 self.W_xh(x[:, t, :]) + self.W_hh(hidden)24 )25 outputs.append(hidden)26 27 # Stack all hidden states28 outputs = torch.stack(outputs, dim=1)29 30 # Final output31 output = self.W_hy(hidden)32 return output, outputs3334# Sử dụng PyTorch built-in35rnn = nn.RNN(36 input_size=10,37 hidden_size=32,38 num_layers=2,39 batch_first=True,40 dropout=0.241)Vanishing/Exploding Gradients
Vấn đề
Gradient Problems
- Vanishing: Gradients → 0 qua nhiều timesteps
- Exploding: Gradients → ∞
- Long sequences = worse problems
Visualization
Text
1Sequence: [x₁, x₂, x₃, ..., x₁₀₀]2Gradient flow: y → h₁₀₀ → h₉₉ → ... → h₁3Each step: gradient × W_hh4Result: gradient = W_hh^100 → very small or very largeLSTM (Long Short-Term Memory)
LSTM giải quyết vanishing gradient với gates:
LSTM Cell Architecture
Diagram
graph TB
subgraph "LSTM Cell"
C_prev[C_{t-1}] --> FG[Forget Gate]
C_prev --> IG[Input Gate]
FG --> C_new[C_t]
IG --> C_new
H_prev[h_{t-1}] --> FG
H_prev --> IG
H_prev --> OG[Output Gate]
X[x_t] --> FG
X --> IG
X --> OG
C_new --> OG
OG --> H_new[h_t]
endLSTM Equations
Forget Gate: Quyết định forget bao nhiêu từ cell state
Input Gate: Quyết định update bao nhiêu
Cell State Update:
Output Gate: Quyết định output
PyTorch LSTM
Python
1class LSTMModel(nn.Module):2 def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, dropout):3 super(LSTMModel, self).__init__()4 5 self.embedding = nn.Embedding(vocab_size, embedding_dim)6 self.lstm = nn.LSTM(7 embedding_dim,8 hidden_dim,9 num_layers=n_layers,10 bidirectional=True,11 dropout=dropout,12 batch_first=True13 )14 self.fc = nn.Linear(hidden_dim * 2, output_dim)15 self.dropout = nn.Dropout(dropout)16 17 def forward(self, text):18 # text: (batch, seq_len)19 embedded = self.dropout(self.embedding(text))20 # embedded: (batch, seq_len, embedding_dim)21 22 output, (hidden, cell) = self.lstm(embedded)23 # hidden: (n_layers * 2, batch, hidden_dim)24 25 # Concatenate forward and backward26 hidden = torch.cat([hidden[-2], hidden[-1]], dim=1)27 # hidden: (batch, hidden_dim * 2)28 29 return self.fc(self.dropout(hidden))3031# Create model32model = LSTMModel(33 vocab_size=10000,34 embedding_dim=100,35 hidden_dim=256,36 output_dim=1,37 n_layers=2,38 dropout=0.539)GRU (Gated Recurrent Unit)
GRU đơn giản hơn LSTM với 2 gates:
Python
1# GRU equations2# Reset gate3r_t = sigmoid(W_r @ [h_{t-1}, x_t])4# Update gate 5z_t = sigmoid(W_z @ [h_{t-1}, x_t])6# Candidate hidden7h_tilde = tanh(W @ [r_t * h_{t-1}, x_t])8# Final hidden9h_t = (1 - z_t) * h_{t-1} + z_t * h_tildePython
1gru = nn.GRU(2 input_size=100,3 hidden_size=256,4 num_layers=2,5 bidirectional=True,6 batch_first=True7)Text Classification Example
Sentiment Analysis với LSTM
Python
1import torch2import torch.nn as nn3from torchtext.datasets import IMDB4from torchtext.data.utils import get_tokenizer5from torchtext.vocab import build_vocab_from_iterator67# Tokenizer8tokenizer = get_tokenizer('basic_english')910# Build vocabulary11def yield_tokens(data_iter):12 for _, text in data_iter:13 yield tokenizer(text)1415train_iter = IMDB(split='train')16vocab = build_vocab_from_iterator(17 yield_tokens(train_iter),18 specials=['<unk>', '<pad>']19)20vocab.set_default_index(vocab['<unk>'])2122# Text pipeline23def text_pipeline(x):24 return vocab(tokenizer(x))2526# Model27class SentimentLSTM(nn.Module):28 def __init__(self, vocab_size, embed_dim=128, hidden_dim=256, output_dim=1):29 super().__init__()30 self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=1)31 self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True, bidirectional=True)32 self.fc = nn.Linear(hidden_dim * 2, output_dim)33 self.sigmoid = nn.Sigmoid()34 35 def forward(self, x):36 embedded = self.embedding(x)37 _, (hidden, _) = self.lstm(embedded)38 hidden = torch.cat([hidden[-2], hidden[-1]], dim=1)39 return self.sigmoid(self.fc(hidden))4041# Training42model = SentimentLSTM(len(vocab))43criterion = nn.BCELoss()44optimizer = torch.optim.Adam(model.parameters())4546# Train loop47for epoch in range(5):48 for label, text in train_iter:49 tokens = text_pipeline(text)50 tokens = torch.tensor([tokens])51 target = torch.tensor([[1.0 if label == 'pos' else 0.0]])52 53 optimizer.zero_grad()54 pred = model(tokens)55 loss = criterion(pred, target)56 loss.backward()57 optimizer.step()Sequence-to-Sequence
Cho tasks như translation, summarization:
Python
1class Encoder(nn.Module):2 def __init__(self, vocab_size, embed_dim, hidden_dim):3 super().__init__()4 self.embedding = nn.Embedding(vocab_size, embed_dim)5 self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)6 7 def forward(self, x):8 embedded = self.embedding(x)9 outputs, (hidden, cell) = self.lstm(embedded)10 return hidden, cell1112class Decoder(nn.Module):13 def __init__(self, vocab_size, embed_dim, hidden_dim):14 super().__init__()15 self.embedding = nn.Embedding(vocab_size, embed_dim)16 self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)17 self.fc = nn.Linear(hidden_dim, vocab_size)18 19 def forward(self, x, hidden, cell):20 embedded = self.embedding(x)21 output, (hidden, cell) = self.lstm(embedded, (hidden, cell))22 prediction = self.fc(output)23 return prediction, hidden, cell2425class Seq2Seq(nn.Module):26 def __init__(self, encoder, decoder):27 super().__init__()28 self.encoder = encoder29 self.decoder = decoder30 31 def forward(self, src, trg):32 hidden, cell = self.encoder(src)33 outputs, _, _ = self.decoder(trg, hidden, cell)34 return outputsBài tập thực hành
Hands-on Exercise
Build Sentiment Classifier với LSTM:
- Dataset: IMDB reviews
- Build LSTM model với:
- Embedding layer
- Bidirectional LSTM
- Dropout
- Train và evaluate
- So sánh với simple RNN
Target: > 85% accuracy trên IMDB
Tiếp theo
Trong bài tiếp theo, chúng ta sẽ học về Transformers - kiến trúc đã revolutionize NLP.
