Lý thuyết
35 phút
Bài 5/5

Recurrent Neural Networks & LSTM

Tìm hiểu RNN và LSTM cho sequential data

🔄 Recurrent Neural Networks (RNN)

RNN được thiết kế cho sequential data như text, time series, audio. Bài này cover RNN cơ bản đến LSTM và GRU.

Tại sao cần RNN?

Sequential Data đặc biệt

Sequential Data
  • Text: "I love this movie" - thứ tự từ quan trọng
  • Time series: Stock prices - phụ thuộc vào past values
  • Audio: Speech - temporal patterns
  • Video: Sequences of frames

Feedforward Networks không đủ

  • Không có "memory" về previous inputs
  • Input size cố định
  • Không capture temporal dependencies

RNN Architecture

Basic RNN Cell

Diagram
graph LR
    X1[x₁] --> H1[h₁]
    H1 --> H2[h₂]
    X2[x₂] --> H2
    H2 --> H3[h₃]
    X3[x₃] --> H3
    H3 --> Y[Output]

ht=tanh(Wxhxt+Whhht1+bh)h_t = tanh(W_{xh} \cdot x_t + W_{hh} \cdot h_{t-1} + b_h) yt=Whyht+byy_t = W_{hy} \cdot h_t + b_y

Implementation

Python
1import torch
2import torch.nn as nn
3
4class SimpleRNN(nn.Module):
5 def __init__(self, input_size, hidden_size, output_size):
6 super(SimpleRNN, self).__init__()
7 self.hidden_size = hidden_size
8
9 # RNN weights
10 self.W_xh = nn.Linear(input_size, hidden_size)
11 self.W_hh = nn.Linear(hidden_size, hidden_size)
12 self.W_hy = nn.Linear(hidden_size, output_size)
13
14 def forward(self, x, hidden=None):
15 batch_size, seq_len, _ = x.shape
16
17 if hidden is None:
18 hidden = torch.zeros(batch_size, self.hidden_size)
19
20 outputs = []
21 for t in range(seq_len):
22 hidden = torch.tanh(
23 self.W_xh(x[:, t, :]) + self.W_hh(hidden)
24 )
25 outputs.append(hidden)
26
27 # Stack all hidden states
28 outputs = torch.stack(outputs, dim=1)
29
30 # Final output
31 output = self.W_hy(hidden)
32 return output, outputs
33
34# Sử dụng PyTorch built-in
35rnn = nn.RNN(
36 input_size=10,
37 hidden_size=32,
38 num_layers=2,
39 batch_first=True,
40 dropout=0.2
41)

Vanishing/Exploding Gradients

Vấn đề

Gradient Problems
  • Vanishing: Gradients → 0 qua nhiều timesteps
  • Exploding: Gradients → ∞
  • Long sequences = worse problems

Visualization

Text
1Sequence: [x₁, x₂, x₃, ..., x₁₀₀]
2Gradient flow: y → h₁₀₀ → h₉₉ → ... → h₁
3Each step: gradient × W_hh
4Result: gradient = W_hh^100 → very small or very large

LSTM (Long Short-Term Memory)

LSTM giải quyết vanishing gradient với gates:

LSTM Cell Architecture

Diagram
graph TB
    subgraph "LSTM Cell"
        C_prev[C_{t-1}] --> FG[Forget Gate]
        C_prev --> IG[Input Gate]
        FG --> C_new[C_t]
        IG --> C_new
        H_prev[h_{t-1}] --> FG
        H_prev --> IG
        H_prev --> OG[Output Gate]
        X[x_t] --> FG
        X --> IG
        X --> OG
        C_new --> OG
        OG --> H_new[h_t]
    end

LSTM Equations

Forget Gate: Quyết định forget bao nhiêu từ cell state ft=σ(Wf[ht1,xt]+bf)f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)

Input Gate: Quyết định update bao nhiêu it=σ(Wi[ht1,xt]+bi)i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) C~t=tanh(WC[ht1,xt]+bC)\tilde{C}_t = tanh(W_C \cdot [h_{t-1}, x_t] + b_C)

Cell State Update: Ct=ftCt1+itC~tC_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t

Output Gate: Quyết định output ot=σ(Wo[ht1,xt]+bo)o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) ht=ottanh(Ct)h_t = o_t \odot tanh(C_t)

PyTorch LSTM

Python
1class LSTMModel(nn.Module):
2 def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, dropout):
3 super(LSTMModel, self).__init__()
4
5 self.embedding = nn.Embedding(vocab_size, embedding_dim)
6 self.lstm = nn.LSTM(
7 embedding_dim,
8 hidden_dim,
9 num_layers=n_layers,
10 bidirectional=True,
11 dropout=dropout,
12 batch_first=True
13 )
14 self.fc = nn.Linear(hidden_dim * 2, output_dim)
15 self.dropout = nn.Dropout(dropout)
16
17 def forward(self, text):
18 # text: (batch, seq_len)
19 embedded = self.dropout(self.embedding(text))
20 # embedded: (batch, seq_len, embedding_dim)
21
22 output, (hidden, cell) = self.lstm(embedded)
23 # hidden: (n_layers * 2, batch, hidden_dim)
24
25 # Concatenate forward and backward
26 hidden = torch.cat([hidden[-2], hidden[-1]], dim=1)
27 # hidden: (batch, hidden_dim * 2)
28
29 return self.fc(self.dropout(hidden))
30
31# Create model
32model = LSTMModel(
33 vocab_size=10000,
34 embedding_dim=100,
35 hidden_dim=256,
36 output_dim=1,
37 n_layers=2,
38 dropout=0.5
39)

GRU (Gated Recurrent Unit)

GRU đơn giản hơn LSTM với 2 gates:

Python
1# GRU equations
2# Reset gate
3r_t = sigmoid(W_r @ [h_{t-1}, x_t])
4# Update gate
5z_t = sigmoid(W_z @ [h_{t-1}, x_t])
6# Candidate hidden
7h_tilde = tanh(W @ [r_t * h_{t-1}, x_t])
8# Final hidden
9h_t = (1 - z_t) * h_{t-1} + z_t * h_tilde
Python
1gru = nn.GRU(
2 input_size=100,
3 hidden_size=256,
4 num_layers=2,
5 bidirectional=True,
6 batch_first=True
7)

Text Classification Example

Sentiment Analysis với LSTM

Python
1import torch
2import torch.nn as nn
3from torchtext.datasets import IMDB
4from torchtext.data.utils import get_tokenizer
5from torchtext.vocab import build_vocab_from_iterator
6
7# Tokenizer
8tokenizer = get_tokenizer('basic_english')
9
10# Build vocabulary
11def yield_tokens(data_iter):
12 for _, text in data_iter:
13 yield tokenizer(text)
14
15train_iter = IMDB(split='train')
16vocab = build_vocab_from_iterator(
17 yield_tokens(train_iter),
18 specials=['<unk>', '<pad>']
19)
20vocab.set_default_index(vocab['<unk>'])
21
22# Text pipeline
23def text_pipeline(x):
24 return vocab(tokenizer(x))
25
26# Model
27class SentimentLSTM(nn.Module):
28 def __init__(self, vocab_size, embed_dim=128, hidden_dim=256, output_dim=1):
29 super().__init__()
30 self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=1)
31 self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True, bidirectional=True)
32 self.fc = nn.Linear(hidden_dim * 2, output_dim)
33 self.sigmoid = nn.Sigmoid()
34
35 def forward(self, x):
36 embedded = self.embedding(x)
37 _, (hidden, _) = self.lstm(embedded)
38 hidden = torch.cat([hidden[-2], hidden[-1]], dim=1)
39 return self.sigmoid(self.fc(hidden))
40
41# Training
42model = SentimentLSTM(len(vocab))
43criterion = nn.BCELoss()
44optimizer = torch.optim.Adam(model.parameters())
45
46# Train loop
47for epoch in range(5):
48 for label, text in train_iter:
49 tokens = text_pipeline(text)
50 tokens = torch.tensor([tokens])
51 target = torch.tensor([[1.0 if label == 'pos' else 0.0]])
52
53 optimizer.zero_grad()
54 pred = model(tokens)
55 loss = criterion(pred, target)
56 loss.backward()
57 optimizer.step()

Sequence-to-Sequence

Cho tasks như translation, summarization:

Python
1class Encoder(nn.Module):
2 def __init__(self, vocab_size, embed_dim, hidden_dim):
3 super().__init__()
4 self.embedding = nn.Embedding(vocab_size, embed_dim)
5 self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)
6
7 def forward(self, x):
8 embedded = self.embedding(x)
9 outputs, (hidden, cell) = self.lstm(embedded)
10 return hidden, cell
11
12class Decoder(nn.Module):
13 def __init__(self, vocab_size, embed_dim, hidden_dim):
14 super().__init__()
15 self.embedding = nn.Embedding(vocab_size, embed_dim)
16 self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)
17 self.fc = nn.Linear(hidden_dim, vocab_size)
18
19 def forward(self, x, hidden, cell):
20 embedded = self.embedding(x)
21 output, (hidden, cell) = self.lstm(embedded, (hidden, cell))
22 prediction = self.fc(output)
23 return prediction, hidden, cell
24
25class Seq2Seq(nn.Module):
26 def __init__(self, encoder, decoder):
27 super().__init__()
28 self.encoder = encoder
29 self.decoder = decoder
30
31 def forward(self, src, trg):
32 hidden, cell = self.encoder(src)
33 outputs, _, _ = self.decoder(trg, hidden, cell)
34 return outputs

Bài tập thực hành

Hands-on Exercise

Build Sentiment Classifier với LSTM:

  1. Dataset: IMDB reviews
  2. Build LSTM model với:
    • Embedding layer
    • Bidirectional LSTM
    • Dropout
  3. Train và evaluate
  4. So sánh với simple RNN

Target: > 85% accuracy trên IMDB

Tiếp theo

Trong bài tiếp theo, chúng ta sẽ học về Transformers - kiến trúc đã revolutionize NLP.


Tài liệu tham khảo