Ứng Dụng LSTM

🎯 Mục tiêu bài học

TB5 min

Trong bài này, bạn sẽ học:

✅ Text Generation với LSTM
✅ Sequence-to-Sequence (Seq2Seq) Architecture
✅ Encoder-Decoder cho Machine Translation
✅ Named Entity Recognition (NER)
✅ Advanced LSTM patterns

LSTM là nền tảng cho nhiều ứng dụng NLP quan trọng trước khi Transformer ra đời. Hiểu LSTM giúp bạn nắm vững concepts cho các architectures hiện đại hơn.

Task 1

✍️ Text Generation

TB5 min

Ý tưởng

Language Model = Dự đoán từ tiếp theo dựa trên các từ trước đó.

"I love" → "you" (probability 0.3) → "it" (probability 0.2) → "machine" (probability 0.1)

Character-level vs Word-level

Level	Vocabulary	Pros	Cons
Character	~100 chars	Nhỏ, handle typos	Sequences dài
Word	10K-100K words	Meaningful units	OOV problem
Subword	30K-50K	Balance	BPE complexity

Character-level Text Generation

python.py

1import numpy as np
2from tensorflow import keras
3from tensorflow.keras import layers
4
5class TextGenerator:
6    """Character-level text generation with LSTM"""
7    
8    def __init__(self, text, seq_length=100):
9        self.text = text
10        self.seq_length = seq_length
11        
12        # Create character mappings
13        self.chars = sorted(list(set(text)))
14        self.char_to_idx = {c: i for i, c in enumerate(self.chars)}
15        self.idx_to_char = {i: c for i, c in enumerate(self.chars)}
16        self.vocab_size = len(self.chars)
17        
18        print(f"Total chars: {len(text)}")
19        print(f"Unique chars: {self.vocab_size}")
20    
21    def prepare_data(self):
22        """Create training sequences"""
23        X, y = [], []
24        
25        for i in range(0, len(self.text) - self.seq_length):
26            seq_in = self.text[i:i + self.seq_length]
27            seq_out = self.text[i + self.seq_length]
28            
29            X.append([self.char_to_idx[c] for c in seq_in])
30            y.append(self.char_to_idx[seq_out])
31        
32        # Reshape for LSTM
33        X = np.array(X)
34        X = X.reshape((X.shape[0], X.shape[1], 1))
35        X = X / float(self.vocab_size)  # Normalize
36        
37        y = keras.utils.to_categorical(y, num_classes=self.vocab_size)
38        
39        return X, y
40    
41    def build_model(self, units=256):
42        """Build LSTM model for text generation"""
43        model = keras.Sequential([
44            layers.LSTM(units, return_sequences=True,
45                        input_shape=(self.seq_length, 1)),
46            layers.Dropout(0.2),
47            layers.LSTM(units),
48            layers.Dropout(0.2),
49            layers.Dense(self.vocab_size, activation='softmax')
50        ])
51        
52        model.compile(
53            optimizer='adam',
54            loss='categorical_crossentropy',
55            metrics=['accuracy']
56        )
57        
58        self.model = model
59        return model
60    
61    def sample(self, preds, temperature=1.0):
62        """Sample from predictions with temperature"""
63        preds = np.asarray(preds).astype('float64')
64        preds = np.log(preds + 1e-10) / temperature
65        exp_preds = np.exp(preds)
66        preds = exp_preds / np.sum(exp_preds)
67        probas = np.random.multinomial(1, preds, 1)
68        return np.argmax(probas)
69    
70    def generate(self, seed_text, length=500, temperature=1.0):
71        """Generate text"""
72        generated = seed_text
73        
74        for _ in range(length):
75            # Prepare input
76            x = np.array([[self.char_to_idx[c] for c in generated[-self.seq_length:]]])
77            x = x.reshape((1, self.seq_length, 1))
78            x = x / float(self.vocab_size)
79            
80            # Predict
81            preds = self.model.predict(x, verbose=0)[0]
82            next_idx = self.sample(preds, temperature)
83            next_char = self.idx_to_char[next_idx]
84            
85            generated += next_char
86        
87        return generated
88
89# Usage
90# text = open('shakespeare.txt').read().lower()
91# generator = TextGenerator(text)
92# X, y = generator.prepare_data()
93# generator.build_model()
94# generator.model.fit(X, y, epochs=50, batch_size=128)
95# print(generator.generate("to be or not to be", temperature=0.5))

Temperature Sampling

Temperature kiểm soát creativity:

T = 0.2: Conservative, lặp lại patterns
T = 1.0: Balanced
T = 1.5: Creative, có thể nonsense

Checkpoint

Bạn đã hiểu Text Generation với LSTM?

Task 2

🔄 Sequence-to-Sequence (Seq2Seq)

TB5 min

Seq2Seq Architecture

Use Cases

Task	Input	Output
Translation	"Hello world"	"Xin chào thế giới"
Summarization	Long article	Short summary
Chatbot	Question	Answer
Image Captioning	Image features	Caption

Basic Seq2Seq Implementation

python.py

1from tensorflow import keras
2from tensorflow.keras import layers
3
4def create_seq2seq_model(
5    input_vocab_size,
6    output_vocab_size,
7    input_length,
8    output_length,
9    embedding_dim=256,
10    latent_dim=512
11):
12    """
13    Basic Seq2Seq model with LSTM
14    """
15    # ENCODER
16    encoder_inputs = keras.Input(shape=(input_length,), name='encoder_input')
17    encoder_embedding = layers.Embedding(
18        input_vocab_size, embedding_dim
19    )(encoder_inputs)
20    
21    # LSTM Encoder - return states
22    encoder_lstm = layers.LSTM(latent_dim, return_state=True)
23    encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
24    
25    # Keep only states as context
26    encoder_states = [state_h, state_c]
27    
28    # DECODER
29    decoder_inputs = keras.Input(shape=(output_length,), name='decoder_input')
30    decoder_embedding = layers.Embedding(
31        output_vocab_size, embedding_dim
32    )(decoder_inputs)
33    
34    # LSTM Decoder - use encoder states as initial state
35    decoder_lstm = layers.LSTM(latent_dim, return_sequences=True, return_state=True)
36    decoder_outputs, _, _ = decoder_lstm(
37        decoder_embedding, 
38        initial_state=encoder_states
39    )
40    
41    # Dense output
42    decoder_dense = layers.Dense(output_vocab_size, activation='softmax')
43    decoder_outputs = decoder_dense(decoder_outputs)
44    
45    # Full model
46    model = keras.Model(
47        [encoder_inputs, decoder_inputs], 
48        decoder_outputs
49    )
50    
51    return model
52
53# Create model
54model = create_seq2seq_model(
55    input_vocab_size=10000,
56    output_vocab_size=15000,
57    input_length=50,
58    output_length=50
59)
60
61model.compile(
62    optimizer='adam',
63    loss='sparse_categorical_crossentropy',
64    metrics=['accuracy']
65)
66
67model.summary()

Checkpoint

Bạn đã hiểu Seq2Seq architecture?

Task 3

🌐 Machine Translation

TB5 min

Data Preparation

python.py

1import numpy as np
2from tensorflow.keras.preprocessing.text import Tokenizer
3from tensorflow.keras.preprocessing.sequence import pad_sequences
4
5class TranslationDataset:
6    """Prepare data for machine translation"""
7    
8    def __init__(self, source_texts, target_texts, 
9                 max_source_len=50, max_target_len=50):
10        self.source_texts = source_texts
11        self.target_texts = target_texts
12        self.max_source_len = max_source_len
13        self.max_target_len = max_target_len
14    
15    def prepare(self):
16        """Tokenize and pad sequences"""
17        # Source tokenizer
18        self.source_tokenizer = Tokenizer(filters='')
19        self.source_tokenizer.fit_on_texts(self.source_texts)
20        source_sequences = self.source_tokenizer.texts_to_sequences(
21            self.source_texts
22        )
23        
24        # Target tokenizer (with START and END tokens)
25        target_texts_processed = [
26            '<START> ' + t + ' <END>' for t in self.target_texts
27        ]
28        self.target_tokenizer = Tokenizer(filters='')
29        self.target_tokenizer.fit_on_texts(target_texts_processed)
30        target_sequences = self.target_tokenizer.texts_to_sequences(
31            target_texts_processed
32        )
33        
34        # Pad sequences
35        encoder_input = pad_sequences(
36            source_sequences, 
37            maxlen=self.max_source_len, 
38            padding='post'
39        )
40        
41        decoder_input = pad_sequences(
42            target_sequences, 
43            maxlen=self.max_target_len, 
44            padding='post'
45        )
46        
47        # Decoder output = decoder input shifted by 1
48        decoder_output = np.zeros_like(decoder_input)
49        decoder_output[:, :-1] = decoder_input[:, 1:]
50        
51        # Vocab sizes
52        self.source_vocab_size = len(self.source_tokenizer.word_index) + 1
53        self.target_vocab_size = len(self.target_tokenizer.word_index) + 1
54        
55        return encoder_input, decoder_input, decoder_output
56
57# Example usage
58source_texts = [
59    "Hello world",
60    "How are you",
61    "Good morning"
62]
63target_texts = [
64    "Xin chào thế giới",
65    "Bạn khỏe không",
66    "Chào buổi sáng"
67]
68
69dataset = TranslationDataset(source_texts, target_texts)
70encoder_input, decoder_input, decoder_output = dataset.prepare()
71
72print(f"Encoder input shape: {encoder_input.shape}")
73print(f"Decoder input shape: {decoder_input.shape}")
74print(f"Source vocab: {dataset.source_vocab_size}")
75print(f"Target vocab: {dataset.target_vocab_size}")

Training Loop

python.py

1# Create model
2model = create_seq2seq_model(
3    input_vocab_size=dataset.source_vocab_size,
4    output_vocab_size=dataset.target_vocab_size,
5    input_length=50,
6    output_length=50
7)
8
9# Train
10model.fit(
11    [encoder_input, decoder_input],
12    decoder_output,
13    epochs=100,
14    batch_size=64,
15    validation_split=0.2,
16    callbacks=[
17        keras.callbacks.EarlyStopping(patience=10)
18    ]
19)

Checkpoint

Bạn đã hiểu cách xây dựng Translation model?

Task 4

🏷️ Named Entity Recognition (NER)

TB5 min

Sequence Labeling Task

Ví dụ

1Input:  "John works at Google in New York"
2Output: [PER] [O]   [O] [ORG]   [O] [LOC] [LOC]
3 
4PER = Person, ORG = Organization, LOC = Location, O = Other

NER Model

python.py

1from tensorflow import keras
2from tensorflow.keras import layers
3
4def create_ner_model(vocab_size, num_tags, embedding_dim=128, 
5                     max_len=100, lstm_units=64):
6    """
7    Bidirectional LSTM for Named Entity Recognition
8    """
9    model = keras.Sequential([
10        # Embedding
11        layers.Embedding(vocab_size, embedding_dim, 
12                         input_length=max_len),
13        layers.SpatialDropout1D(0.1),
14        
15        # Bidirectional LSTM - keep all time steps
16        layers.Bidirectional(
17            layers.LSTM(lstm_units, return_sequences=True)
18        ),
19        layers.Dropout(0.3),
20        
21        layers.Bidirectional(
22            layers.LSTM(lstm_units // 2, return_sequences=True)
23        ),
24        
25        # Output tag for EACH token
26        layers.TimeDistributed(
27            layers.Dense(num_tags, activation='softmax')
28        )
29    ])
30    
31    return model
32
33# Tags: O, B-PER, I-PER, B-ORG, I-ORG, B-LOC, I-LOC, ...
34NUM_TAGS = 9
35VOCAB_SIZE = 10000
36MAX_LEN = 100
37
38model = create_ner_model(VOCAB_SIZE, NUM_TAGS)
39model.compile(
40    optimizer='adam',
41    loss='sparse_categorical_crossentropy',
42    metrics=['accuracy']
43)
44
45model.summary()

TimeDistributed Layer

TimeDistributed áp dụng layer cho MỖI time step:

Cần thiết khi output là sequence (like NER).

Checkpoint

Bạn đã hiểu NER với LSTM?

Task 5

📊 Attention Mechanism (Preview)

TB5 min

Vấn đề với Seq2Seq

Bottleneck problem: Toàn bộ input sequence được nén vào 1 fixed-size vector (context).

Với câu dài, context vector không đủ capacity → mất thông tin!

Attention: Giải pháp

Simple Attention Layer

python.py

1import tensorflow as tf
2from tensorflow.keras import layers
3
4class AttentionLayer(layers.Layer):
5    """Simple Bahdanau Attention"""
6    
7    def __init__(self, units):
8        super().__init__()
9        self.W1 = layers.Dense(units)  # For encoder hidden
10        self.W2 = layers.Dense(units)  # For decoder hidden
11        self.V = layers.Dense(1)       # Score
12    
13    def call(self, encoder_output, decoder_hidden):
14        """
15        encoder_output: (batch, seq_len, hidden)
16        decoder_hidden: (batch, hidden)
17        """
18        # Expand decoder hidden for broadcasting
19        decoder_hidden_expanded = tf.expand_dims(decoder_hidden, 1)
20        # (batch, 1, hidden)
21        
22        # Calculate attention scores
23        score = self.V(tf.nn.tanh(
24            self.W1(encoder_output) + self.W2(decoder_hidden_expanded)
25        ))
26        # (batch, seq_len, 1)
27        
28        # Softmax to get attention weights
29        attention_weights = tf.nn.softmax(score, axis=1)
30        # (batch, seq_len, 1)
31        
32        # Context vector
33        context = tf.reduce_sum(
34            attention_weights * encoder_output, 
35            axis=1
36        )
37        # (batch, hidden)
38        
39        return context, attention_weights
40
41# Demo
42attention = AttentionLayer(64)
43
44encoder_out = tf.random.normal((32, 50, 128))  # 50 time steps
45decoder_hid = tf.random.normal((32, 128))
46
47context, weights = attention(encoder_out, decoder_hid)
48print(f"Context shape: {context.shape}")
49print(f"Attention weights shape: {weights.shape}")

Expected Output

1Context shape: (32, 128)
2Attention weights shape: (32, 50, 1)

Attention mechanism là nền tảng cho Transformer - architecture sẽ học ở Module tiếp theo!

Checkpoint

Bạn đã có cái nhìn về Attention?

Task 6

🎯 Tổng kết Module LSTM

TB5 min

Task	Architecture	Key Feature
Text Generation	Stacked LSTM	Temperature sampling
Translation	Encoder-Decoder	Seq2Seq
NER	Bidirectional LSTM	TimeDistributed
Sentiment	Many-to-One	Classification head
Time Series	LSTM/GRU	Numerical prediction

Key Architectures

Python

1# Text Classification (Many-to-One)
2Embedding → LSTM → Dense
3
4# Sequence Labeling (Many-to-Many)
5Embedding → BiLSTM → TimeDistributed(Dense)
6
7# Seq2Seq
8Encoder(LSTM) → [states] → Decoder(LSTM) → Dense
9
10# With Attention
11Encoder → AttentionLayer → Decoder

So sánh với Module trước

Aspect	SimpleRNN	LSTM
Long-term	❌ Poor	✅ Good
Training	Fast	Slower
Parameters	Few	More
Applications	Simple tasks	Complex NLP

Hạn chế của LSTM

Vấn đề	Impact
Sequential	Cannot parallelize
Slow	Long training time
Bottleneck	Fixed context size
Long range	Still struggles with very long sequences

Module tiếp theo

Transformer & Attention:

Self-Attention: Mỗi vị trí attend đến tất cả vị trí
Multi-Head Attention: Nhiều attention patterns
Parallelization: Train nhanh hơn nhiều
BERT, GPT: State-of-the-art models

🎉 Hoàn thành Module LSTM! Bạn đã nắm được foundations cho các NLP architectures hiện đại.

Task 7

Ứng Dụng LSTM - Seq2Seq và Beyond

🎯 Mục tiêu bài học

✍️ Text Generation

Ý tưởng

Character-level vs Word-level

Character-level Text Generation

Temperature Sampling

Checkpoint

🔄 Sequence-to-Sequence (Seq2Seq)

Seq2Seq Architecture

Use Cases

Basic Seq2Seq Implementation

Checkpoint

🌐 Machine Translation

Data Preparation

Training Loop

Checkpoint

🏷️ Named Entity Recognition (NER)

Sequence Labeling Task

NER Model

TimeDistributed Layer

Checkpoint

📊 Attention Mechanism (Preview)

Vấn đề với Seq2Seq

Attention: Giải pháp

Simple Attention Layer

Checkpoint

🎯 Tổng kết Module LSTM