Deep Learning for Recommendation Systems
Deep Learning đã revolutionize RecSys — YouTube, TikTok, Spotify đều dùng neural models. Bài này cover các architecture phổ biến nhất trong production.
🎯 Mục tiêu
- Neural Collaborative Filtering (NCF)
- Two-Tower architecture
- Sequential Recommendation (session-based)
- Practical implementation với PyTorch
1. Neural Collaborative Filtering (NCF)
1.1 Architecture
NCF Architecture
👤User ID
📦Item ID
🔢User Embedding
🔢Item Embedding
🔗Concatenate
🧠MLP
⭐Predicted Rating
MLP captures non-linear user-item interactions (vs Matrix Factorization chỉ linear dot product)
1.2 Implementation
Python
1import torch2import torch.nn as nn34class NCF(nn.Module):5 def __init__(self, n_users, n_items, embed_dim=64, hidden_dims=[128, 64, 32]):6 super().__init__()7 self.user_embed = nn.Embedding(n_users, embed_dim)8 self.item_embed = nn.Embedding(n_items, embed_dim)9 10 # MLP layers11 layers = []12 input_dim = embed_dim * 213 for hidden_dim in hidden_dims:14 layers.extend([15 nn.Linear(input_dim, hidden_dim),16 nn.ReLU(),17 nn.BatchNorm1d(hidden_dim),18 nn.Dropout(0.2)19 ])20 input_dim = hidden_dim21 layers.append(nn.Linear(input_dim, 1))22 23 self.mlp = nn.Sequential(*layers)24 25 def forward(self, user_ids, item_ids):26 user_emb = self.user_embed(user_ids)27 item_emb = self.item_embed(item_ids)28 x = torch.cat([user_emb, item_emb], dim=-1)29 return self.mlp(x).squeeze()1.3 Training
Python
1from torch.utils.data import Dataset, DataLoader23class RatingDataset(Dataset):4 def __init__(self, users, items, ratings):5 self.users = torch.LongTensor(users)6 self.items = torch.LongTensor(items)7 self.ratings = torch.FloatTensor(ratings)8 9 def __len__(self):10 return len(self.ratings)11 12 def __getitem__(self, idx):13 return self.users[idx], self.items[idx], self.ratings[idx]1415# Setup16model = NCF(n_users=1000, n_items=5000, embed_dim=64)17optimizer = torch.optim.Adam(model.parameters(), lr=0.001)18criterion = nn.MSELoss()1920train_loader = DataLoader(21 RatingDataset(train_users, train_items, train_ratings),22 batch_size=256, shuffle=True23)2425# Training loop26for epoch in range(20):27 model.train()28 total_loss = 029 for users, items, ratings in train_loader:30 optimizer.zero_grad()31 preds = model(users, items)32 loss = criterion(preds, ratings)33 loss.backward()34 optimizer.step()35 total_loss += loss.item()36 37 avg_loss = total_loss / len(train_loader)38 print(f"Epoch {epoch+1}: Loss = {avg_loss:.4f}")2. Two-Tower Architecture
2.1 Concept
Two-Tower Architecture
👤User Features
(user_id, age, history, context)
📦Item Features
(item_id, category, price, description)
🧠User Tower (MLP)
🧠Item Tower (MLP)
📊User Vector (d=128)
📊Item Vector (d=128)
✖️Dot Product
🏆Score / Ranking
Ưu điểm: Item vectors precompute offline → serving cực nhanh với ANN search.
2.2 Implementation
Python
1class TwoTower(nn.Module):2 def __init__(self, user_feat_dim, item_feat_dim, embed_dim=128):3 super().__init__()4 5 # User tower6 self.user_tower = nn.Sequential(7 nn.Linear(user_feat_dim, 256),8 nn.ReLU(),9 nn.Dropout(0.3),10 nn.Linear(256, 128),11 nn.ReLU(),12 nn.Linear(128, embed_dim)13 )14 15 # Item tower16 self.item_tower = nn.Sequential(17 nn.Linear(item_feat_dim, 256),18 nn.ReLU(),19 nn.Dropout(0.3),20 nn.Linear(256, 128),21 nn.ReLU(),22 nn.Linear(128, embed_dim)23 )24 25 def forward(self, user_features, item_features):26 user_vec = self.user_tower(user_features)27 item_vec = self.item_tower(item_features)28 29 # L2 normalize30 user_vec = torch.nn.functional.normalize(user_vec, dim=-1)31 item_vec = torch.nn.functional.normalize(item_vec, dim=-1)32 33 # Dot product similarity34 score = (user_vec * item_vec).sum(dim=-1)35 return score36 37 def get_user_embedding(self, user_features):38 """For online serving."""39 return torch.nn.functional.normalize(40 self.user_tower(user_features), dim=-141 )42 43 def get_item_embedding(self, item_features):44 """Precompute offline, index with FAISS."""45 return torch.nn.functional.normalize(46 self.item_tower(item_features), dim=-147 )2.3 Production Serving
Ví dụ
1Offline Pipeline:21. Compute ALL item embeddings → store in FAISS index32. Update daily/weekly4 5Online Serving:61. User request comes in72. Compute user embedding (real-time)83. FAISS ANN search → top 100 candidates (< 10ms)94. Re-ranking model (more feature-rich) → top 10105. Return recommendations11 12FAISS Example:Python
1import faiss2import numpy as np34# Offline: index all item embeddings5item_embeddings = model.get_item_embedding(all_item_features)6item_vecs = item_embeddings.detach().numpy()78# Build FAISS index9dimension = 12810index = faiss.IndexFlatIP(dimension) # Inner Product (cosine after normalize)11index.add(item_vecs)1213# Online: search for user14user_vec = model.get_user_embedding(user_features).detach().numpy()15distances, indices = index.search(user_vec, k=100) # top 100 candidates3. Sequential Recommendation
3.1 Concept
User behavior là sequence: "xem A → click B → mua C → next là gì?"
Ví dụ
1Session: [item_1, item_2, item_3, item_4, ???]2Model learns: given sequence → predict next item3 4Similar to language modeling:5- Items = tokens6- User history = sentence7- Next item = next word prediction3.2 SASRec (Self-Attentive Sequential Recommendation)
Python
1class SASRec(nn.Module):2 def __init__(self, n_items, max_len=50, embed_dim=64, n_heads=2, n_layers=2):3 super().__init__()4 self.item_embed = nn.Embedding(n_items + 1, embed_dim, padding_idx=0)5 self.pos_embed = nn.Embedding(max_len, embed_dim)6 7 encoder_layer = nn.TransformerEncoderLayer(8 d_model=embed_dim,9 nhead=n_heads,10 dim_feedforward=embed_dim * 4,11 dropout=0.2,12 batch_first=True13 )14 self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=n_layers)15 self.output = nn.Linear(embed_dim, n_items + 1)16 self.max_len = max_len17 18 def forward(self, item_seq):19 # item_seq: (batch, seq_len) — padded with 020 seq_len = item_seq.size(1)21 positions = torch.arange(seq_len, device=item_seq.device).unsqueeze(0)22 23 x = self.item_embed(item_seq) + self.pos_embed(positions)24 25 # Causal mask (cannot see future items)26 mask = nn.Transformer.generate_square_subsequent_mask(seq_len)27 mask = mask.to(item_seq.device)28 29 # Padding mask30 padding_mask = (item_seq == 0)31 32 x = self.transformer(x, mask=mask, src_key_padding_mask=padding_mask)33 logits = self.output(x) # (batch, seq_len, n_items)34 return logits35 36 def predict_next(self, item_seq):37 """Predict next item from sequence."""38 logits = self.forward(item_seq)39 # Last position prediction40 next_item_scores = logits[:, -1, :] # (batch, n_items)41 return next_item_scores.argsort(dim=-1, descending=True)3.3 Training
Python
1def create_sequences(user_items, max_len=50):2 """Create training sequences from user interaction history."""3 sequences = []4 for user_id, items in user_items.items():5 for i in range(1, len(items)):6 seq = items[max(0, i - max_len):i]7 target = items[i]8 # Pad to max_len9 padded = [0] * (max_len - len(seq)) + seq10 sequences.append((padded, target))11 return sequences1213# Training14model = SASRec(n_items=10000, max_len=50)15optimizer = torch.optim.Adam(model.parameters(), lr=0.001)16criterion = nn.CrossEntropyLoss(ignore_index=0)1718for epoch in range(10):19 model.train()20 for batch_seqs, batch_targets in train_loader:21 logits = model(batch_seqs)22 # Predict from last position23 last_logits = logits[:, -1, :]24 loss = criterion(last_logits, batch_targets)25 26 optimizer.zero_grad()27 loss.backward()28 optimizer.step()4. Model Comparison
| Model | Type | Data | Scalability | Accuracy | Use Case |
|---|---|---|---|---|---|
| SVD | Matrix Factorization | Explicit ratings | Medium | Good | Movie ratings |
| NCF | Neural CF | Explicit/Implicit | Medium | Better | General |
| Two-Tower | Retrieval | Rich features | High | Good (retrieval) | Large-scale (YouTube) |
| SASRec | Sequential | Click streams | High | Best (session) | E-commerce, News |
| GNN-based | Graph | Social + interactions | Low | Best (cold start) | Social platforms |
Industry Examples
| Company | Model | Scale |
|---|---|---|
| YouTube | Two-Tower + DNN re-ranker | Billions of videos |
| TikTok | Sequential + Multi-task | Millions of short videos |
| Spotify | Collaborative Filtering + NLP | 100M+ tracks |
| Netflix | Matrix Factorization + DNN | 15,000+ titles |
| Amazon | Item-based CF + DNN | Millions of products |
5. Practical Tips
5.1 Embedding Tricks
Python
1# 1. Pre-trained embeddings (warm start)2# Use Word2Vec-style training on item sequences3from gensim.models import Word2Vec45# Treat user sessions as "sentences", items as "words"6sessions = [["item_1", "item_3", "item_7"], ["item_2", "item_3", "item_5"]]7w2v = Word2Vec(sessions, vector_size=64, window=5, min_count=1, sg=1)8item_vectors = w2v.wv # Use as pre-trained embeddings910# 2. Side information (item features)11# Concatenate item embedding with category, price, etc.12class RichItemEmbedding(nn.Module):13 def __init__(self, n_items, n_categories, embed_dim=64):14 super().__init__()15 self.item_embed = nn.Embedding(n_items, embed_dim)16 self.cat_embed = nn.Embedding(n_categories, 16)17 self.combine = nn.Linear(embed_dim + 16 + 1, embed_dim) # +1 for price18 19 def forward(self, item_ids, cat_ids, prices):20 ie = self.item_embed(item_ids)21 ce = self.cat_embed(cat_ids)22 combined = torch.cat([ie, ce, prices.unsqueeze(-1)], dim=-1)23 return self.combine(combined)5.2 Negative Sampling
Python
1# Random negative sampling for implicit feedback2def sample_negatives(positive_items, n_items, n_neg=4):3 """For each positive interaction, sample n_neg random negatives."""4 negatives = []5 for _ in range(n_neg):6 neg = np.random.randint(0, n_items)7 while neg in positive_items:8 neg = np.random.randint(0, n_items)9 negatives.append(neg)10 return negatives1112# BPR Loss (Bayesian Personalized Ranking)13def bpr_loss(pos_score, neg_score):14 return -torch.log(torch.sigmoid(pos_score - neg_score)).mean()📝 Quiz
-
NCF khác Matrix Factorization ở điểm nào chính?
- NCF nhanh hơn
- NCF dùng MLP để capture non-linear interactions
- NCF không cần embeddings
- Matrix Factorization chính xác hơn
-
Two-Tower architecture phù hợp production vì?
- Accuracy cao nhất
- Đơn giản nhất để implement
- Item embeddings precompute offline, serving nhanh với ANN search
- Không cần training
-
Sequential RecSys phù hợp cho?
- Movie rating prediction
- Predict next click/purchase based on browsing history
- Cold-start users
- Chỉ explicit feedback
🎯 Key Takeaways
- NCF — Non-linear CF, replaces dot product with MLP
- Two-Tower — Production standard cho large-scale retrieval
- SASRec — Transformer cho sequential patterns
- FAISS — ANN search cho real-time serving
- Negative sampling — Essential cho implicit feedback training
🚀 Bài tiếp theo
MLOps Fundamentals — Đưa ML models vào production với MLOps practices!
