🔄 Recurrent Neural Networks (RNN)

RNN được thiết kế cho sequential data như text, time series, audio. Bài này cover RNN cơ bản đến LSTM và GRU.

Tại sao cần RNN?

Sequential Data đặc biệt

Sequential Data

Text: "I love this movie" - thứ tự từ quan trọng
Time series: Stock prices - phụ thuộc vào past values
Audio: Speech - temporal patterns
Video: Sequences of frames

Feedforward Networks không đủ

Không có "memory" về previous inputs
Input size cố định
Không capture temporal dependencies

RNN Architecture

Basic RNN Cell

Diagram

graph LR
    X1[x₁] --> H1[h₁]
    H1 --> H2[h₂]
    X2[x₂] --> H2
    H2 --> H3[h₃]
    X3[x₃] --> H3
    H3 --> Y[Output]

$h_t = tanh(W_{xh} \cdot x_t + W_{hh} \cdot h_{t-1} + b_h)$ $y_t = W_{hy} \cdot h_t + b_y$

Implementation

Python

1import torch
2import torch.nn as nn
3
4class SimpleRNN(nn.Module):
5    def __init__(self, input_size, hidden_size, output_size):
6        super(SimpleRNN, self).__init__()
7        self.hidden_size = hidden_size
8        
9        # RNN weights
10        self.W_xh = nn.Linear(input_size, hidden_size)
11        self.W_hh = nn.Linear(hidden_size, hidden_size)
12        self.W_hy = nn.Linear(hidden_size, output_size)
13    
14    def forward(self, x, hidden=None):
15        batch_size, seq_len, _ = x.shape
16        
17        if hidden is None:
18            hidden = torch.zeros(batch_size, self.hidden_size)
19        
20        outputs = []
21        for t in range(seq_len):
22            hidden = torch.tanh(
23                self.W_xh(x[:, t, :]) + self.W_hh(hidden)
24            )
25            outputs.append(hidden)
26        
27        # Stack all hidden states
28        outputs = torch.stack(outputs, dim=1)
29        
30        # Final output
31        output = self.W_hy(hidden)
32        return output, outputs
33
34# Sử dụng PyTorch built-in
35rnn = nn.RNN(
36    input_size=10,
37    hidden_size=32,
38    num_layers=2,
39    batch_first=True,
40    dropout=0.2
41)

Vanishing/Exploding Gradients

Vấn đề

Gradient Problems

Vanishing: Gradients → 0 qua nhiều timesteps
Exploding: Gradients → ∞
Long sequences = worse problems

Visualization

Text

1Sequence: [x₁, x₂, x₃, ..., x₁₀₀]
2Gradient flow: y → h₁₀₀ → h₉₉ → ... → h₁
3Each step: gradient × W_hh
4Result: gradient = W_hh^100 → very small or very large

LSTM (Long Short-Term Memory)

LSTM giải quyết vanishing gradient với gates:

LSTM Cell Architecture

Diagram

graph TB
    subgraph "LSTM Cell"
        C_prev[C_{t-1}] --> FG[Forget Gate]
        C_prev --> IG[Input Gate]
        FG --> C_new[C_t]
        IG --> C_new
        H_prev[h_{t-1}] --> FG
        H_prev --> IG
        H_prev --> OG[Output Gate]
        X[x_t] --> FG
        X --> IG
        X --> OG
        C_new --> OG
        OG --> H_new[h_t]
    end

LSTM Equations

Forget Gate: Quyết định forget bao nhiêu từ cell state $f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$

Input Gate: Quyết định update bao nhiêu $i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$ $\tilde{C}_t = tanh(W_C \cdot [h_{t-1}, x_t] + b_C)$

Cell State Update: $C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t$

Output Gate: Quyết định output $o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$ $h_t = o_t \odot tanh(C_t)$

PyTorch LSTM

Python

1class LSTMModel(nn.Module):
2    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, dropout):
3        super(LSTMModel, self).__init__()
4        
5        self.embedding = nn.Embedding(vocab_size, embedding_dim)
6        self.lstm = nn.LSTM(
7            embedding_dim,
8            hidden_dim,
9            num_layers=n_layers,
10            bidirectional=True,
11            dropout=dropout,
12            batch_first=True
13        )
14        self.fc = nn.Linear(hidden_dim * 2, output_dim)
15        self.dropout = nn.Dropout(dropout)
16    
17    def forward(self, text):
18        # text: (batch, seq_len)
19        embedded = self.dropout(self.embedding(text))
20        # embedded: (batch, seq_len, embedding_dim)
21        
22        output, (hidden, cell) = self.lstm(embedded)
23        # hidden: (n_layers * 2, batch, hidden_dim)
24        
25        # Concatenate forward and backward
26        hidden = torch.cat([hidden[-2], hidden[-1]], dim=1)
27        # hidden: (batch, hidden_dim * 2)
28        
29        return self.fc(self.dropout(hidden))
30
31# Create model
32model = LSTMModel(
33    vocab_size=10000,
34    embedding_dim=100,
35    hidden_dim=256,
36    output_dim=1,
37    n_layers=2,
38    dropout=0.5
39)

GRU (Gated Recurrent Unit)

GRU đơn giản hơn LSTM với 2 gates:

Python

1# GRU equations
2# Reset gate
3r_t = sigmoid(W_r @ [h_{t-1}, x_t])
4# Update gate  
5z_t = sigmoid(W_z @ [h_{t-1}, x_t])
6# Candidate hidden
7h_tilde = tanh(W @ [r_t * h_{t-1}, x_t])
8# Final hidden
9h_t = (1 - z_t) * h_{t-1} + z_t * h_tilde

Python

1gru = nn.GRU(
2    input_size=100,
3    hidden_size=256,
4    num_layers=2,
5    bidirectional=True,
6    batch_first=True
7)

Text Classification Example

Sentiment Analysis với LSTM

Python

1import torch
2import torch.nn as nn
3from torchtext.datasets import IMDB
4from torchtext.data.utils import get_tokenizer
5from torchtext.vocab import build_vocab_from_iterator
6
7# Tokenizer
8tokenizer = get_tokenizer('basic_english')
9
10# Build vocabulary
11def yield_tokens(data_iter):
12    for _, text in data_iter:
13        yield tokenizer(text)
14
15train_iter = IMDB(split='train')
16vocab = build_vocab_from_iterator(
17    yield_tokens(train_iter),
18    specials=['<unk>', '<pad>']
19)
20vocab.set_default_index(vocab['<unk>'])
21
22# Text pipeline
23def text_pipeline(x):
24    return vocab(tokenizer(x))
25
26# Model
27class SentimentLSTM(nn.Module):
28    def __init__(self, vocab_size, embed_dim=128, hidden_dim=256, output_dim=1):
29        super().__init__()
30        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=1)
31        self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True, bidirectional=True)
32        self.fc = nn.Linear(hidden_dim * 2, output_dim)
33        self.sigmoid = nn.Sigmoid()
34    
35    def forward(self, x):
36        embedded = self.embedding(x)
37        _, (hidden, _) = self.lstm(embedded)
38        hidden = torch.cat([hidden[-2], hidden[-1]], dim=1)
39        return self.sigmoid(self.fc(hidden))
40
41# Training
42model = SentimentLSTM(len(vocab))
43criterion = nn.BCELoss()
44optimizer = torch.optim.Adam(model.parameters())
45
46# Train loop
47for epoch in range(5):
48    for label, text in train_iter:
49        tokens = text_pipeline(text)
50        tokens = torch.tensor([tokens])
51        target = torch.tensor([[1.0 if label == 'pos' else 0.0]])
52        
53        optimizer.zero_grad()
54        pred = model(tokens)
55        loss = criterion(pred, target)
56        loss.backward()
57        optimizer.step()

Sequence-to-Sequence

Cho tasks như translation, summarization:

Python

1class Encoder(nn.Module):
2    def __init__(self, vocab_size, embed_dim, hidden_dim):
3        super().__init__()
4        self.embedding = nn.Embedding(vocab_size, embed_dim)
5        self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)
6    
7    def forward(self, x):
8        embedded = self.embedding(x)
9        outputs, (hidden, cell) = self.lstm(embedded)
10        return hidden, cell
11
12class Decoder(nn.Module):
13    def __init__(self, vocab_size, embed_dim, hidden_dim):
14        super().__init__()
15        self.embedding = nn.Embedding(vocab_size, embed_dim)
16        self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)
17        self.fc = nn.Linear(hidden_dim, vocab_size)
18    
19    def forward(self, x, hidden, cell):
20        embedded = self.embedding(x)
21        output, (hidden, cell) = self.lstm(embedded, (hidden, cell))
22        prediction = self.fc(output)
23        return prediction, hidden, cell
24
25class Seq2Seq(nn.Module):
26    def __init__(self, encoder, decoder):
27        super().__init__()
28        self.encoder = encoder
29        self.decoder = decoder
30    
31    def forward(self, src, trg):
32        hidden, cell = self.encoder(src)
33        outputs, _, _ = self.decoder(trg, hidden, cell)
34        return outputs

Bài tập thực hành

Hands-on Exercise

Build Sentiment Classifier với LSTM:

Dataset: IMDB reviews
Build LSTM model với:
- Embedding layer
- Bidirectional LSTM
- Dropout
Train và evaluate
So sánh với simple RNN

Target: > 85% accuracy trên IMDB

Trong bài tiếp theo, chúng ta sẽ học về Transformers - kiến trúc đã revolutionize NLP.

Recurrent Neural Networks & LSTM

🔄 Recurrent Neural Networks (RNN)

Tại sao cần RNN?

Sequential Data đặc biệt

Feedforward Networks không đủ

RNN Architecture

Basic RNN Cell

Implementation

Vanishing/Exploding Gradients

Vấn đề

Visualization

LSTM (Long Short-Term Memory)

LSTM Cell Architecture

LSTM Equations

PyTorch LSTM

GRU (Gated Recurrent Unit)

Text Classification Example

Sentiment Analysis với LSTM

Sequence-to-Sequence

Bài tập thực hành

Tiếp theo

Tài liệu tham khảo