MinAI - Về trang chủ
Lý thuyết
14/2170 phút
Đang tải...

Ứng Dụng LSTM - Seq2Seq và Beyond

Áp dụng LSTM vào Text Generation, Sequence-to-Sequence, Machine Translation, và các ứng dụng nâng cao

1

🎯 Mục tiêu bài học

TB5 min

Trong bài này, bạn sẽ học:

  • Text Generation với LSTM
  • Sequence-to-Sequence (Seq2Seq) Architecture
  • Encoder-Decoder cho Machine Translation
  • Named Entity Recognition (NER)
  • Advanced LSTM patterns

LSTM là nền tảng cho nhiều ứng dụng NLP quan trọng trước khi Transformer ra đời. Hiểu LSTM giúp bạn nắm vững concepts cho các architectures hiện đại hơn.

2

✍️ Text Generation

TB5 min

Ý tưởng

Language Model = Dự đoán từ tiếp theo dựa trên các từ trước đó.

"I love" → "you" (probability 0.3) → "it" (probability 0.2) → "machine" (probability 0.1)

Character-level vs Word-level

LevelVocabularyProsCons
Character~100 charsNhỏ, handle typosSequences dài
Word10K-100K wordsMeaningful unitsOOV problem
Subword30K-50KBalanceBPE complexity

Character-level Text Generation

python.py
1import numpy as np
2from tensorflow import keras
3from tensorflow.keras import layers
4
5class TextGenerator:
6 """Character-level text generation with LSTM"""
7
8 def __init__(self, text, seq_length=100):
9 self.text = text
10 self.seq_length = seq_length
11
12 # Create character mappings
13 self.chars = sorted(list(set(text)))
14 self.char_to_idx = {c: i for i, c in enumerate(self.chars)}
15 self.idx_to_char = {i: c for i, c in enumerate(self.chars)}
16 self.vocab_size = len(self.chars)
17
18 print(f"Total chars: {len(text)}")
19 print(f"Unique chars: {self.vocab_size}")
20
21 def prepare_data(self):
22 """Create training sequences"""
23 X, y = [], []
24
25 for i in range(0, len(self.text) - self.seq_length):
26 seq_in = self.text[i:i + self.seq_length]
27 seq_out = self.text[i + self.seq_length]
28
29 X.append([self.char_to_idx[c] for c in seq_in])
30 y.append(self.char_to_idx[seq_out])
31
32 # Reshape for LSTM
33 X = np.array(X)
34 X = X.reshape((X.shape[0], X.shape[1], 1))
35 X = X / float(self.vocab_size) # Normalize
36
37 y = keras.utils.to_categorical(y, num_classes=self.vocab_size)
38
39 return X, y
40
41 def build_model(self, units=256):
42 """Build LSTM model for text generation"""
43 model = keras.Sequential([
44 layers.LSTM(units, return_sequences=True,
45 input_shape=(self.seq_length, 1)),
46 layers.Dropout(0.2),
47 layers.LSTM(units),
48 layers.Dropout(0.2),
49 layers.Dense(self.vocab_size, activation='softmax')
50 ])
51
52 model.compile(
53 optimizer='adam',
54 loss='categorical_crossentropy',
55 metrics=['accuracy']
56 )
57
58 self.model = model
59 return model
60
61 def sample(self, preds, temperature=1.0):
62 """Sample from predictions with temperature"""
63 preds = np.asarray(preds).astype('float64')
64 preds = np.log(preds + 1e-10) / temperature
65 exp_preds = np.exp(preds)
66 preds = exp_preds / np.sum(exp_preds)
67 probas = np.random.multinomial(1, preds, 1)
68 return np.argmax(probas)
69
70 def generate(self, seed_text, length=500, temperature=1.0):
71 """Generate text"""
72 generated = seed_text
73
74 for _ in range(length):
75 # Prepare input
76 x = np.array([[self.char_to_idx[c] for c in generated[-self.seq_length:]]])
77 x = x.reshape((1, self.seq_length, 1))
78 x = x / float(self.vocab_size)
79
80 # Predict
81 preds = self.model.predict(x, verbose=0)[0]
82 next_idx = self.sample(preds, temperature)
83 next_char = self.idx_to_char[next_idx]
84
85 generated += next_char
86
87 return generated
88
89# Usage
90# text = open('shakespeare.txt').read().lower()
91# generator = TextGenerator(text)
92# X, y = generator.prepare_data()
93# generator.build_model()
94# generator.model.fit(X, y, epochs=50, batch_size=128)
95# print(generator.generate("to be or not to be", temperature=0.5))

Temperature Sampling

Temperature kiểm soát creativity:

  • T = 0.2: Conservative, lặp lại patterns
  • T = 1.0: Balanced
  • T = 1.5: Creative, có thể nonsense

Checkpoint

Bạn đã hiểu Text Generation với LSTM?

3

🔄 Sequence-to-Sequence (Seq2Seq)

TB5 min

Seq2Seq Architecture

Kiến trúc Encoder-DecoderENCODERx₁x₂x₃LSTMLSTMLSTMh₁h₂h₃Input SequenceContextVectorDECODERLSTMLSTMLSTMy₁"Xin"y₂"chào"y₃"[END]"Output SequenceEncoder LSTMContext VectorDecoder LSTM

Use Cases

TaskInputOutput
Translation"Hello world""Xin chào thế giới"
SummarizationLong articleShort summary
ChatbotQuestionAnswer
Image CaptioningImage featuresCaption

Basic Seq2Seq Implementation

python.py
1from tensorflow import keras
2from tensorflow.keras import layers
3
4def create_seq2seq_model(
5 input_vocab_size,
6 output_vocab_size,
7 input_length,
8 output_length,
9 embedding_dim=256,
10 latent_dim=512
11):
12 """
13 Basic Seq2Seq model with LSTM
14 """
15 # ENCODER
16 encoder_inputs = keras.Input(shape=(input_length,), name='encoder_input')
17 encoder_embedding = layers.Embedding(
18 input_vocab_size, embedding_dim
19 )(encoder_inputs)
20
21 # LSTM Encoder - return states
22 encoder_lstm = layers.LSTM(latent_dim, return_state=True)
23 encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
24
25 # Keep only states as context
26 encoder_states = [state_h, state_c]
27
28 # DECODER
29 decoder_inputs = keras.Input(shape=(output_length,), name='decoder_input')
30 decoder_embedding = layers.Embedding(
31 output_vocab_size, embedding_dim
32 )(decoder_inputs)
33
34 # LSTM Decoder - use encoder states as initial state
35 decoder_lstm = layers.LSTM(latent_dim, return_sequences=True, return_state=True)
36 decoder_outputs, _, _ = decoder_lstm(
37 decoder_embedding,
38 initial_state=encoder_states
39 )
40
41 # Dense output
42 decoder_dense = layers.Dense(output_vocab_size, activation='softmax')
43 decoder_outputs = decoder_dense(decoder_outputs)
44
45 # Full model
46 model = keras.Model(
47 [encoder_inputs, decoder_inputs],
48 decoder_outputs
49 )
50
51 return model
52
53# Create model
54model = create_seq2seq_model(
55 input_vocab_size=10000,
56 output_vocab_size=15000,
57 input_length=50,
58 output_length=50
59)
60
61model.compile(
62 optimizer='adam',
63 loss='sparse_categorical_crossentropy',
64 metrics=['accuracy']
65)
66
67model.summary()

Checkpoint

Bạn đã hiểu Seq2Seq architecture?

4

🌐 Machine Translation

TB5 min

Data Preparation

python.py
1import numpy as np
2from tensorflow.keras.preprocessing.text import Tokenizer
3from tensorflow.keras.preprocessing.sequence import pad_sequences
4
5class TranslationDataset:
6 """Prepare data for machine translation"""
7
8 def __init__(self, source_texts, target_texts,
9 max_source_len=50, max_target_len=50):
10 self.source_texts = source_texts
11 self.target_texts = target_texts
12 self.max_source_len = max_source_len
13 self.max_target_len = max_target_len
14
15 def prepare(self):
16 """Tokenize and pad sequences"""
17 # Source tokenizer
18 self.source_tokenizer = Tokenizer(filters='')
19 self.source_tokenizer.fit_on_texts(self.source_texts)
20 source_sequences = self.source_tokenizer.texts_to_sequences(
21 self.source_texts
22 )
23
24 # Target tokenizer (with START and END tokens)
25 target_texts_processed = [
26 '<START> ' + t + ' <END>' for t in self.target_texts
27 ]
28 self.target_tokenizer = Tokenizer(filters='')
29 self.target_tokenizer.fit_on_texts(target_texts_processed)
30 target_sequences = self.target_tokenizer.texts_to_sequences(
31 target_texts_processed
32 )
33
34 # Pad sequences
35 encoder_input = pad_sequences(
36 source_sequences,
37 maxlen=self.max_source_len,
38 padding='post'
39 )
40
41 decoder_input = pad_sequences(
42 target_sequences,
43 maxlen=self.max_target_len,
44 padding='post'
45 )
46
47 # Decoder output = decoder input shifted by 1
48 decoder_output = np.zeros_like(decoder_input)
49 decoder_output[:, :-1] = decoder_input[:, 1:]
50
51 # Vocab sizes
52 self.source_vocab_size = len(self.source_tokenizer.word_index) + 1
53 self.target_vocab_size = len(self.target_tokenizer.word_index) + 1
54
55 return encoder_input, decoder_input, decoder_output
56
57# Example usage
58source_texts = [
59 "Hello world",
60 "How are you",
61 "Good morning"
62]
63target_texts = [
64 "Xin chào thế giới",
65 "Bạn khỏe không",
66 "Chào buổi sáng"
67]
68
69dataset = TranslationDataset(source_texts, target_texts)
70encoder_input, decoder_input, decoder_output = dataset.prepare()
71
72print(f"Encoder input shape: {encoder_input.shape}")
73print(f"Decoder input shape: {decoder_input.shape}")
74print(f"Source vocab: {dataset.source_vocab_size}")
75print(f"Target vocab: {dataset.target_vocab_size}")

Training Loop

python.py
1# Create model
2model = create_seq2seq_model(
3 input_vocab_size=dataset.source_vocab_size,
4 output_vocab_size=dataset.target_vocab_size,
5 input_length=50,
6 output_length=50
7)
8
9# Train
10model.fit(
11 [encoder_input, decoder_input],
12 decoder_output,
13 epochs=100,
14 batch_size=64,
15 validation_split=0.2,
16 callbacks=[
17 keras.callbacks.EarlyStopping(patience=10)
18 ]
19)

Checkpoint

Bạn đã hiểu cách xây dựng Translation model?

5

🏷️ Named Entity Recognition (NER)

TB5 min

Sequence Labeling Task

Ví dụ
1Input: "John works at Google in New York"
2Output: [PER] [O] [O] [ORG] [O] [LOC] [LOC]
3
4PER = Person, ORG = Organization, LOC = Location, O = Other

NER Model

python.py
1from tensorflow import keras
2from tensorflow.keras import layers
3
4def create_ner_model(vocab_size, num_tags, embedding_dim=128,
5 max_len=100, lstm_units=64):
6 """
7 Bidirectional LSTM for Named Entity Recognition
8 """
9 model = keras.Sequential([
10 # Embedding
11 layers.Embedding(vocab_size, embedding_dim,
12 input_length=max_len),
13 layers.SpatialDropout1D(0.1),
14
15 # Bidirectional LSTM - keep all time steps
16 layers.Bidirectional(
17 layers.LSTM(lstm_units, return_sequences=True)
18 ),
19 layers.Dropout(0.3),
20
21 layers.Bidirectional(
22 layers.LSTM(lstm_units // 2, return_sequences=True)
23 ),
24
25 # Output tag for EACH token
26 layers.TimeDistributed(
27 layers.Dense(num_tags, activation='softmax')
28 )
29 ])
30
31 return model
32
33# Tags: O, B-PER, I-PER, B-ORG, I-ORG, B-LOC, I-LOC, ...
34NUM_TAGS = 9
35VOCAB_SIZE = 10000
36MAX_LEN = 100
37
38model = create_ner_model(VOCAB_SIZE, NUM_TAGS)
39model.compile(
40 optimizer='adam',
41 loss='sparse_categorical_crossentropy',
42 metrics=['accuracy']
43)
44
45model.summary()

TimeDistributed Layer

TimeDistributed áp dụng layer cho MỖI time step:

So sánh: Có và Không có TimeDistributedKhông có TimeDistributedInput(batch, timesteps, features)Dense(batch, units) — Mất timesteps!Có TimeDistributedInput(batch, timesteps, features)TimeDistributed(Dense)(batch, timesteps, units) — Giữ timesteps!

Cần thiết khi output là sequence (like NER).

Checkpoint

Bạn đã hiểu NER với LSTM?

6

📊 Attention Mechanism (Preview)

TB5 min

Vấn đề với Seq2Seq

Bottleneck problem: Toàn bộ input sequence được nén vào 1 fixed-size vector (context).

Với câu dài, context vector không đủ capacity → mất thông tin!

Attention: Giải pháp

So sánh: Không có vs Có AttentionKhông có Attentionh₁h₂h₃hₙEncoderContext VectorDecoder⚠️ Bottleneck! Nén tất cả vào 1 vectorCó Attentionh₁h₂h₃hₙEncoderα1α2α3α4Attention WeightsDecoderNhìn vào parts khác nhau tại mỗi step

Simple Attention Layer

python.py
1import tensorflow as tf
2from tensorflow.keras import layers
3
4class AttentionLayer(layers.Layer):
5 """Simple Bahdanau Attention"""
6
7 def __init__(self, units):
8 super().__init__()
9 self.W1 = layers.Dense(units) # For encoder hidden
10 self.W2 = layers.Dense(units) # For decoder hidden
11 self.V = layers.Dense(1) # Score
12
13 def call(self, encoder_output, decoder_hidden):
14 """
15 encoder_output: (batch, seq_len, hidden)
16 decoder_hidden: (batch, hidden)
17 """
18 # Expand decoder hidden for broadcasting
19 decoder_hidden_expanded = tf.expand_dims(decoder_hidden, 1)
20 # (batch, 1, hidden)
21
22 # Calculate attention scores
23 score = self.V(tf.nn.tanh(
24 self.W1(encoder_output) + self.W2(decoder_hidden_expanded)
25 ))
26 # (batch, seq_len, 1)
27
28 # Softmax to get attention weights
29 attention_weights = tf.nn.softmax(score, axis=1)
30 # (batch, seq_len, 1)
31
32 # Context vector
33 context = tf.reduce_sum(
34 attention_weights * encoder_output,
35 axis=1
36 )
37 # (batch, hidden)
38
39 return context, attention_weights
40
41# Demo
42attention = AttentionLayer(64)
43
44encoder_out = tf.random.normal((32, 50, 128)) # 50 time steps
45decoder_hid = tf.random.normal((32, 128))
46
47context, weights = attention(encoder_out, decoder_hid)
48print(f"Context shape: {context.shape}")
49print(f"Attention weights shape: {weights.shape}")
Expected Output
1Context shape: (32, 128)
2Attention weights shape: (32, 50, 1)

Attention mechanism là nền tảng cho Transformer - architecture sẽ học ở Module tiếp theo!

Checkpoint

Bạn đã có cái nhìn về Attention?

7

🎯 Tổng kết Module LSTM

TB5 min

Ứng dụng LSTM

TaskArchitectureKey Feature
Text GenerationStacked LSTMTemperature sampling
TranslationEncoder-DecoderSeq2Seq
NERBidirectional LSTMTimeDistributed
SentimentMany-to-OneClassification head
Time SeriesLSTM/GRUNumerical prediction

Key Architectures

Python
1# Text Classification (Many-to-One)
2Embedding LSTM Dense
3
4# Sequence Labeling (Many-to-Many)
5Embedding BiLSTM TimeDistributed(Dense)
6
7# Seq2Seq
8Encoder(LSTM) [states] Decoder(LSTM) Dense
9
10# With Attention
11Encoder AttentionLayer Decoder

So sánh với Module trước

AspectSimpleRNNLSTM
Long-term❌ Poor✅ Good
TrainingFastSlower
ParametersFewMore
ApplicationsSimple tasksComplex NLP

Hạn chế của LSTM

Vấn đềImpact
SequentialCannot parallelize
SlowLong training time
BottleneckFixed context size
Long rangeStill struggles with very long sequences

Module tiếp theo

Transformer & Attention:

  • Self-Attention: Mỗi vị trí attend đến tất cả vị trí
  • Multi-Head Attention: Nhiều attention patterns
  • Parallelization: Train nhanh hơn nhiều
  • BERT, GPT: State-of-the-art models

🎉 Hoàn thành Module LSTM! Bạn đã nắm được foundations cho các NLP architectures hiện đại.