🎯 Mục tiêu bài học
Trong bài này, bạn sẽ học:
- ✅ Text Generation với LSTM
- ✅ Sequence-to-Sequence (Seq2Seq) Architecture
- ✅ Encoder-Decoder cho Machine Translation
- ✅ Named Entity Recognition (NER)
- ✅ Advanced LSTM patterns
LSTM là nền tảng cho nhiều ứng dụng NLP quan trọng trước khi Transformer ra đời. Hiểu LSTM giúp bạn nắm vững concepts cho các architectures hiện đại hơn.
✍️ Text Generation
Ý tưởng
Language Model = Dự đoán từ tiếp theo dựa trên các từ trước đó.
"I love" → "you" (probability 0.3) → "it" (probability 0.2) → "machine" (probability 0.1)
Character-level vs Word-level
| Level | Vocabulary | Pros | Cons |
|---|---|---|---|
| Character | ~100 chars | Nhỏ, handle typos | Sequences dài |
| Word | 10K-100K words | Meaningful units | OOV problem |
| Subword | 30K-50K | Balance | BPE complexity |
Character-level Text Generation
1import numpy as np2from tensorflow import keras3from tensorflow.keras import layers45class TextGenerator:6 """Character-level text generation with LSTM"""7 8 def __init__(self, text, seq_length=100):9 self.text = text10 self.seq_length = seq_length11 12 # Create character mappings13 self.chars = sorted(list(set(text)))14 self.char_to_idx = {c: i for i, c in enumerate(self.chars)}15 self.idx_to_char = {i: c for i, c in enumerate(self.chars)}16 self.vocab_size = len(self.chars)17 18 print(f"Total chars: {len(text)}")19 print(f"Unique chars: {self.vocab_size}")20 21 def prepare_data(self):22 """Create training sequences"""23 X, y = [], []24 25 for i in range(0, len(self.text) - self.seq_length):26 seq_in = self.text[i:i + self.seq_length]27 seq_out = self.text[i + self.seq_length]28 29 X.append([self.char_to_idx[c] for c in seq_in])30 y.append(self.char_to_idx[seq_out])31 32 # Reshape for LSTM33 X = np.array(X)34 X = X.reshape((X.shape[0], X.shape[1], 1))35 X = X / float(self.vocab_size) # Normalize36 37 y = keras.utils.to_categorical(y, num_classes=self.vocab_size)38 39 return X, y40 41 def build_model(self, units=256):42 """Build LSTM model for text generation"""43 model = keras.Sequential([44 layers.LSTM(units, return_sequences=True,45 input_shape=(self.seq_length, 1)),46 layers.Dropout(0.2),47 layers.LSTM(units),48 layers.Dropout(0.2),49 layers.Dense(self.vocab_size, activation='softmax')50 ])51 52 model.compile(53 optimizer='adam',54 loss='categorical_crossentropy',55 metrics=['accuracy']56 )57 58 self.model = model59 return model60 61 def sample(self, preds, temperature=1.0):62 """Sample from predictions with temperature"""63 preds = np.asarray(preds).astype('float64')64 preds = np.log(preds + 1e-10) / temperature65 exp_preds = np.exp(preds)66 preds = exp_preds / np.sum(exp_preds)67 probas = np.random.multinomial(1, preds, 1)68 return np.argmax(probas)69 70 def generate(self, seed_text, length=500, temperature=1.0):71 """Generate text"""72 generated = seed_text73 74 for _ in range(length):75 # Prepare input76 x = np.array([[self.char_to_idx[c] for c in generated[-self.seq_length:]]])77 x = x.reshape((1, self.seq_length, 1))78 x = x / float(self.vocab_size)79 80 # Predict81 preds = self.model.predict(x, verbose=0)[0]82 next_idx = self.sample(preds, temperature)83 next_char = self.idx_to_char[next_idx]84 85 generated += next_char86 87 return generated8889# Usage90# text = open('shakespeare.txt').read().lower()91# generator = TextGenerator(text)92# X, y = generator.prepare_data()93# generator.build_model()94# generator.model.fit(X, y, epochs=50, batch_size=128)95# print(generator.generate("to be or not to be", temperature=0.5))Temperature Sampling
Temperature kiểm soát creativity:
- T = 0.2: Conservative, lặp lại patterns
- T = 1.0: Balanced
- T = 1.5: Creative, có thể nonsense
Checkpoint
Bạn đã hiểu Text Generation với LSTM?
🔄 Sequence-to-Sequence (Seq2Seq)
Seq2Seq Architecture
Use Cases
| Task | Input | Output |
|---|---|---|
| Translation | "Hello world" | "Xin chào thế giới" |
| Summarization | Long article | Short summary |
| Chatbot | Question | Answer |
| Image Captioning | Image features | Caption |
Basic Seq2Seq Implementation
1from tensorflow import keras2from tensorflow.keras import layers34def create_seq2seq_model(5 input_vocab_size,6 output_vocab_size,7 input_length,8 output_length,9 embedding_dim=256,10 latent_dim=51211):12 """13 Basic Seq2Seq model with LSTM14 """15 # ENCODER16 encoder_inputs = keras.Input(shape=(input_length,), name='encoder_input')17 encoder_embedding = layers.Embedding(18 input_vocab_size, embedding_dim19 )(encoder_inputs)20 21 # LSTM Encoder - return states22 encoder_lstm = layers.LSTM(latent_dim, return_state=True)23 encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)24 25 # Keep only states as context26 encoder_states = [state_h, state_c]27 28 # DECODER29 decoder_inputs = keras.Input(shape=(output_length,), name='decoder_input')30 decoder_embedding = layers.Embedding(31 output_vocab_size, embedding_dim32 )(decoder_inputs)33 34 # LSTM Decoder - use encoder states as initial state35 decoder_lstm = layers.LSTM(latent_dim, return_sequences=True, return_state=True)36 decoder_outputs, _, _ = decoder_lstm(37 decoder_embedding, 38 initial_state=encoder_states39 )40 41 # Dense output42 decoder_dense = layers.Dense(output_vocab_size, activation='softmax')43 decoder_outputs = decoder_dense(decoder_outputs)44 45 # Full model46 model = keras.Model(47 [encoder_inputs, decoder_inputs], 48 decoder_outputs49 )50 51 return model5253# Create model54model = create_seq2seq_model(55 input_vocab_size=10000,56 output_vocab_size=15000,57 input_length=50,58 output_length=5059)6061model.compile(62 optimizer='adam',63 loss='sparse_categorical_crossentropy',64 metrics=['accuracy']65)6667model.summary()Checkpoint
Bạn đã hiểu Seq2Seq architecture?
🌐 Machine Translation
Data Preparation
1import numpy as np2from tensorflow.keras.preprocessing.text import Tokenizer3from tensorflow.keras.preprocessing.sequence import pad_sequences45class TranslationDataset:6 """Prepare data for machine translation"""7 8 def __init__(self, source_texts, target_texts, 9 max_source_len=50, max_target_len=50):10 self.source_texts = source_texts11 self.target_texts = target_texts12 self.max_source_len = max_source_len13 self.max_target_len = max_target_len14 15 def prepare(self):16 """Tokenize and pad sequences"""17 # Source tokenizer18 self.source_tokenizer = Tokenizer(filters='')19 self.source_tokenizer.fit_on_texts(self.source_texts)20 source_sequences = self.source_tokenizer.texts_to_sequences(21 self.source_texts22 )23 24 # Target tokenizer (with START and END tokens)25 target_texts_processed = [26 '<START> ' + t + ' <END>' for t in self.target_texts27 ]28 self.target_tokenizer = Tokenizer(filters='')29 self.target_tokenizer.fit_on_texts(target_texts_processed)30 target_sequences = self.target_tokenizer.texts_to_sequences(31 target_texts_processed32 )33 34 # Pad sequences35 encoder_input = pad_sequences(36 source_sequences, 37 maxlen=self.max_source_len, 38 padding='post'39 )40 41 decoder_input = pad_sequences(42 target_sequences, 43 maxlen=self.max_target_len, 44 padding='post'45 )46 47 # Decoder output = decoder input shifted by 148 decoder_output = np.zeros_like(decoder_input)49 decoder_output[:, :-1] = decoder_input[:, 1:]50 51 # Vocab sizes52 self.source_vocab_size = len(self.source_tokenizer.word_index) + 153 self.target_vocab_size = len(self.target_tokenizer.word_index) + 154 55 return encoder_input, decoder_input, decoder_output5657# Example usage58source_texts = [59 "Hello world",60 "How are you",61 "Good morning"62]63target_texts = [64 "Xin chào thế giới",65 "Bạn khỏe không",66 "Chào buổi sáng"67]6869dataset = TranslationDataset(source_texts, target_texts)70encoder_input, decoder_input, decoder_output = dataset.prepare()7172print(f"Encoder input shape: {encoder_input.shape}")73print(f"Decoder input shape: {decoder_input.shape}")74print(f"Source vocab: {dataset.source_vocab_size}")75print(f"Target vocab: {dataset.target_vocab_size}")Training Loop
1# Create model2model = create_seq2seq_model(3 input_vocab_size=dataset.source_vocab_size,4 output_vocab_size=dataset.target_vocab_size,5 input_length=50,6 output_length=507)89# Train10model.fit(11 [encoder_input, decoder_input],12 decoder_output,13 epochs=100,14 batch_size=64,15 validation_split=0.2,16 callbacks=[17 keras.callbacks.EarlyStopping(patience=10)18 ]19)Checkpoint
Bạn đã hiểu cách xây dựng Translation model?
🏷️ Named Entity Recognition (NER)
Sequence Labeling Task
1Input: "John works at Google in New York"2Output: [PER] [O] [O] [ORG] [O] [LOC] [LOC]3 4PER = Person, ORG = Organization, LOC = Location, O = OtherNER Model
1from tensorflow import keras2from tensorflow.keras import layers34def create_ner_model(vocab_size, num_tags, embedding_dim=128, 5 max_len=100, lstm_units=64):6 """7 Bidirectional LSTM for Named Entity Recognition8 """9 model = keras.Sequential([10 # Embedding11 layers.Embedding(vocab_size, embedding_dim, 12 input_length=max_len),13 layers.SpatialDropout1D(0.1),14 15 # Bidirectional LSTM - keep all time steps16 layers.Bidirectional(17 layers.LSTM(lstm_units, return_sequences=True)18 ),19 layers.Dropout(0.3),20 21 layers.Bidirectional(22 layers.LSTM(lstm_units // 2, return_sequences=True)23 ),24 25 # Output tag for EACH token26 layers.TimeDistributed(27 layers.Dense(num_tags, activation='softmax')28 )29 ])30 31 return model3233# Tags: O, B-PER, I-PER, B-ORG, I-ORG, B-LOC, I-LOC, ...34NUM_TAGS = 935VOCAB_SIZE = 1000036MAX_LEN = 1003738model = create_ner_model(VOCAB_SIZE, NUM_TAGS)39model.compile(40 optimizer='adam',41 loss='sparse_categorical_crossentropy',42 metrics=['accuracy']43)4445model.summary()TimeDistributed Layer
TimeDistributed áp dụng layer cho MỖI time step:
Cần thiết khi output là sequence (like NER).
Checkpoint
Bạn đã hiểu NER với LSTM?
📊 Attention Mechanism (Preview)
Vấn đề với Seq2Seq
Bottleneck problem: Toàn bộ input sequence được nén vào 1 fixed-size vector (context).
Với câu dài, context vector không đủ capacity → mất thông tin!
Attention: Giải pháp
Simple Attention Layer
1import tensorflow as tf2from tensorflow.keras import layers34class AttentionLayer(layers.Layer):5 """Simple Bahdanau Attention"""6 7 def __init__(self, units):8 super().__init__()9 self.W1 = layers.Dense(units) # For encoder hidden10 self.W2 = layers.Dense(units) # For decoder hidden11 self.V = layers.Dense(1) # Score12 13 def call(self, encoder_output, decoder_hidden):14 """15 encoder_output: (batch, seq_len, hidden)16 decoder_hidden: (batch, hidden)17 """18 # Expand decoder hidden for broadcasting19 decoder_hidden_expanded = tf.expand_dims(decoder_hidden, 1)20 # (batch, 1, hidden)21 22 # Calculate attention scores23 score = self.V(tf.nn.tanh(24 self.W1(encoder_output) + self.W2(decoder_hidden_expanded)25 ))26 # (batch, seq_len, 1)27 28 # Softmax to get attention weights29 attention_weights = tf.nn.softmax(score, axis=1)30 # (batch, seq_len, 1)31 32 # Context vector33 context = tf.reduce_sum(34 attention_weights * encoder_output, 35 axis=136 )37 # (batch, hidden)38 39 return context, attention_weights4041# Demo42attention = AttentionLayer(64)4344encoder_out = tf.random.normal((32, 50, 128)) # 50 time steps45decoder_hid = tf.random.normal((32, 128))4647context, weights = attention(encoder_out, decoder_hid)48print(f"Context shape: {context.shape}")49print(f"Attention weights shape: {weights.shape}")1Context shape: (32, 128)2Attention weights shape: (32, 50, 1)Attention mechanism là nền tảng cho Transformer - architecture sẽ học ở Module tiếp theo!
Checkpoint
Bạn đã có cái nhìn về Attention?
🎯 Tổng kết Module LSTM
Ứng dụng LSTM
| Task | Architecture | Key Feature |
|---|---|---|
| Text Generation | Stacked LSTM | Temperature sampling |
| Translation | Encoder-Decoder | Seq2Seq |
| NER | Bidirectional LSTM | TimeDistributed |
| Sentiment | Many-to-One | Classification head |
| Time Series | LSTM/GRU | Numerical prediction |
Key Architectures
1# Text Classification (Many-to-One)2Embedding → LSTM → Dense34# Sequence Labeling (Many-to-Many)5Embedding → BiLSTM → TimeDistributed(Dense)67# Seq2Seq8Encoder(LSTM) → [states] → Decoder(LSTM) → Dense910# With Attention11Encoder → AttentionLayer → DecoderSo sánh với Module trước
| Aspect | SimpleRNN | LSTM |
|---|---|---|
| Long-term | ❌ Poor | ✅ Good |
| Training | Fast | Slower |
| Parameters | Few | More |
| Applications | Simple tasks | Complex NLP |
Hạn chế của LSTM
| Vấn đề | Impact |
|---|---|
| Sequential | Cannot parallelize |
| Slow | Long training time |
| Bottleneck | Fixed context size |
| Long range | Still struggles with very long sequences |
Module tiếp theo
Transformer & Attention:
- Self-Attention: Mỗi vị trí attend đến tất cả vị trí
- Multi-Head Attention: Nhiều attention patterns
- Parallelization: Train nhanh hơn nhiều
- BERT, GPT: State-of-the-art models
🎉 Hoàn thành Module LSTM! Bạn đã nắm được foundations cho các NLP architectures hiện đại.
