MinAI - Về trang chủ
Hướng dẫn
8/1340 phút
Đang tải...

Deep Learning for Recommendation Systems

Neural Collaborative Filtering, Two-Tower Architecture, và Sequential RecSys

Deep Learning for Recommendation Systems

Deep Learning đã revolutionize RecSys — YouTube, TikTok, Spotify đều dùng neural models. Bài này cover các architecture phổ biến nhất trong production.

🎯 Mục tiêu

  • Neural Collaborative Filtering (NCF)
  • Two-Tower architecture
  • Sequential Recommendation (session-based)
  • Practical implementation với PyTorch

1. Neural Collaborative Filtering (NCF)

1.1 Architecture

NCF Architecture

👤User ID
📦Item ID
🔢User Embedding
🔢Item Embedding
🔗Concatenate
🧠MLP
Predicted Rating

MLP captures non-linear user-item interactions (vs Matrix Factorization chỉ linear dot product)

1.2 Implementation

Python
1import torch
2import torch.nn as nn
3
4class NCF(nn.Module):
5 def __init__(self, n_users, n_items, embed_dim=64, hidden_dims=[128, 64, 32]):
6 super().__init__()
7 self.user_embed = nn.Embedding(n_users, embed_dim)
8 self.item_embed = nn.Embedding(n_items, embed_dim)
9
10 # MLP layers
11 layers = []
12 input_dim = embed_dim * 2
13 for hidden_dim in hidden_dims:
14 layers.extend([
15 nn.Linear(input_dim, hidden_dim),
16 nn.ReLU(),
17 nn.BatchNorm1d(hidden_dim),
18 nn.Dropout(0.2)
19 ])
20 input_dim = hidden_dim
21 layers.append(nn.Linear(input_dim, 1))
22
23 self.mlp = nn.Sequential(*layers)
24
25 def forward(self, user_ids, item_ids):
26 user_emb = self.user_embed(user_ids)
27 item_emb = self.item_embed(item_ids)
28 x = torch.cat([user_emb, item_emb], dim=-1)
29 return self.mlp(x).squeeze()

1.3 Training

Python
1from torch.utils.data import Dataset, DataLoader
2
3class RatingDataset(Dataset):
4 def __init__(self, users, items, ratings):
5 self.users = torch.LongTensor(users)
6 self.items = torch.LongTensor(items)
7 self.ratings = torch.FloatTensor(ratings)
8
9 def __len__(self):
10 return len(self.ratings)
11
12 def __getitem__(self, idx):
13 return self.users[idx], self.items[idx], self.ratings[idx]
14
15# Setup
16model = NCF(n_users=1000, n_items=5000, embed_dim=64)
17optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
18criterion = nn.MSELoss()
19
20train_loader = DataLoader(
21 RatingDataset(train_users, train_items, train_ratings),
22 batch_size=256, shuffle=True
23)
24
25# Training loop
26for epoch in range(20):
27 model.train()
28 total_loss = 0
29 for users, items, ratings in train_loader:
30 optimizer.zero_grad()
31 preds = model(users, items)
32 loss = criterion(preds, ratings)
33 loss.backward()
34 optimizer.step()
35 total_loss += loss.item()
36
37 avg_loss = total_loss / len(train_loader)
38 print(f"Epoch {epoch+1}: Loss = {avg_loss:.4f}")

2. Two-Tower Architecture

2.1 Concept

Two-Tower Architecture

👤User Features (user_id, age, history, context)
📦Item Features (item_id, category, price, description)
🧠User Tower (MLP)
🧠Item Tower (MLP)
📊User Vector (d=128)
📊Item Vector (d=128)
✖️Dot Product
🏆Score / Ranking

Ưu điểm: Item vectors precompute offline → serving cực nhanh với ANN search.

2.2 Implementation

Python
1class TwoTower(nn.Module):
2 def __init__(self, user_feat_dim, item_feat_dim, embed_dim=128):
3 super().__init__()
4
5 # User tower
6 self.user_tower = nn.Sequential(
7 nn.Linear(user_feat_dim, 256),
8 nn.ReLU(),
9 nn.Dropout(0.3),
10 nn.Linear(256, 128),
11 nn.ReLU(),
12 nn.Linear(128, embed_dim)
13 )
14
15 # Item tower
16 self.item_tower = nn.Sequential(
17 nn.Linear(item_feat_dim, 256),
18 nn.ReLU(),
19 nn.Dropout(0.3),
20 nn.Linear(256, 128),
21 nn.ReLU(),
22 nn.Linear(128, embed_dim)
23 )
24
25 def forward(self, user_features, item_features):
26 user_vec = self.user_tower(user_features)
27 item_vec = self.item_tower(item_features)
28
29 # L2 normalize
30 user_vec = torch.nn.functional.normalize(user_vec, dim=-1)
31 item_vec = torch.nn.functional.normalize(item_vec, dim=-1)
32
33 # Dot product similarity
34 score = (user_vec * item_vec).sum(dim=-1)
35 return score
36
37 def get_user_embedding(self, user_features):
38 """For online serving."""
39 return torch.nn.functional.normalize(
40 self.user_tower(user_features), dim=-1
41 )
42
43 def get_item_embedding(self, item_features):
44 """Precompute offline, index with FAISS."""
45 return torch.nn.functional.normalize(
46 self.item_tower(item_features), dim=-1
47 )

2.3 Production Serving

Ví dụ
1Offline Pipeline:
21. Compute ALL item embeddings → store in FAISS index
32. Update daily/weekly
4
5Online Serving:
61. User request comes in
72. Compute user embedding (real-time)
83. FAISS ANN search → top 100 candidates (< 10ms)
94. Re-ranking model (more feature-rich) → top 10
105. Return recommendations
11
12FAISS Example:
Python
1import faiss
2import numpy as np
3
4# Offline: index all item embeddings
5item_embeddings = model.get_item_embedding(all_item_features)
6item_vecs = item_embeddings.detach().numpy()
7
8# Build FAISS index
9dimension = 128
10index = faiss.IndexFlatIP(dimension) # Inner Product (cosine after normalize)
11index.add(item_vecs)
12
13# Online: search for user
14user_vec = model.get_user_embedding(user_features).detach().numpy()
15distances, indices = index.search(user_vec, k=100) # top 100 candidates

3. Sequential Recommendation

3.1 Concept

User behavior là sequence: "xem A → click B → mua C → next là gì?"

Ví dụ
1Session: [item_1, item_2, item_3, item_4, ???]
2Model learns: given sequence → predict next item
3
4Similar to language modeling:
5- Items = tokens
6- User history = sentence
7- Next item = next word prediction

3.2 SASRec (Self-Attentive Sequential Recommendation)

Python
1class SASRec(nn.Module):
2 def __init__(self, n_items, max_len=50, embed_dim=64, n_heads=2, n_layers=2):
3 super().__init__()
4 self.item_embed = nn.Embedding(n_items + 1, embed_dim, padding_idx=0)
5 self.pos_embed = nn.Embedding(max_len, embed_dim)
6
7 encoder_layer = nn.TransformerEncoderLayer(
8 d_model=embed_dim,
9 nhead=n_heads,
10 dim_feedforward=embed_dim * 4,
11 dropout=0.2,
12 batch_first=True
13 )
14 self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=n_layers)
15 self.output = nn.Linear(embed_dim, n_items + 1)
16 self.max_len = max_len
17
18 def forward(self, item_seq):
19 # item_seq: (batch, seq_len) — padded with 0
20 seq_len = item_seq.size(1)
21 positions = torch.arange(seq_len, device=item_seq.device).unsqueeze(0)
22
23 x = self.item_embed(item_seq) + self.pos_embed(positions)
24
25 # Causal mask (cannot see future items)
26 mask = nn.Transformer.generate_square_subsequent_mask(seq_len)
27 mask = mask.to(item_seq.device)
28
29 # Padding mask
30 padding_mask = (item_seq == 0)
31
32 x = self.transformer(x, mask=mask, src_key_padding_mask=padding_mask)
33 logits = self.output(x) # (batch, seq_len, n_items)
34 return logits
35
36 def predict_next(self, item_seq):
37 """Predict next item from sequence."""
38 logits = self.forward(item_seq)
39 # Last position prediction
40 next_item_scores = logits[:, -1, :] # (batch, n_items)
41 return next_item_scores.argsort(dim=-1, descending=True)

3.3 Training

Python
1def create_sequences(user_items, max_len=50):
2 """Create training sequences from user interaction history."""
3 sequences = []
4 for user_id, items in user_items.items():
5 for i in range(1, len(items)):
6 seq = items[max(0, i - max_len):i]
7 target = items[i]
8 # Pad to max_len
9 padded = [0] * (max_len - len(seq)) + seq
10 sequences.append((padded, target))
11 return sequences
12
13# Training
14model = SASRec(n_items=10000, max_len=50)
15optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
16criterion = nn.CrossEntropyLoss(ignore_index=0)
17
18for epoch in range(10):
19 model.train()
20 for batch_seqs, batch_targets in train_loader:
21 logits = model(batch_seqs)
22 # Predict from last position
23 last_logits = logits[:, -1, :]
24 loss = criterion(last_logits, batch_targets)
25
26 optimizer.zero_grad()
27 loss.backward()
28 optimizer.step()

4. Model Comparison

ModelTypeDataScalabilityAccuracyUse Case
SVDMatrix FactorizationExplicit ratingsMediumGoodMovie ratings
NCFNeural CFExplicit/ImplicitMediumBetterGeneral
Two-TowerRetrievalRich featuresHighGood (retrieval)Large-scale (YouTube)
SASRecSequentialClick streamsHighBest (session)E-commerce, News
GNN-basedGraphSocial + interactionsLowBest (cold start)Social platforms

Industry Examples

CompanyModelScale
YouTubeTwo-Tower + DNN re-rankerBillions of videos
TikTokSequential + Multi-taskMillions of short videos
SpotifyCollaborative Filtering + NLP100M+ tracks
NetflixMatrix Factorization + DNN15,000+ titles
AmazonItem-based CF + DNNMillions of products

5. Practical Tips

5.1 Embedding Tricks

Python
1# 1. Pre-trained embeddings (warm start)
2# Use Word2Vec-style training on item sequences
3from gensim.models import Word2Vec
4
5# Treat user sessions as "sentences", items as "words"
6sessions = [["item_1", "item_3", "item_7"], ["item_2", "item_3", "item_5"]]
7w2v = Word2Vec(sessions, vector_size=64, window=5, min_count=1, sg=1)
8item_vectors = w2v.wv # Use as pre-trained embeddings
9
10# 2. Side information (item features)
11# Concatenate item embedding with category, price, etc.
12class RichItemEmbedding(nn.Module):
13 def __init__(self, n_items, n_categories, embed_dim=64):
14 super().__init__()
15 self.item_embed = nn.Embedding(n_items, embed_dim)
16 self.cat_embed = nn.Embedding(n_categories, 16)
17 self.combine = nn.Linear(embed_dim + 16 + 1, embed_dim) # +1 for price
18
19 def forward(self, item_ids, cat_ids, prices):
20 ie = self.item_embed(item_ids)
21 ce = self.cat_embed(cat_ids)
22 combined = torch.cat([ie, ce, prices.unsqueeze(-1)], dim=-1)
23 return self.combine(combined)

5.2 Negative Sampling

Python
1# Random negative sampling for implicit feedback
2def sample_negatives(positive_items, n_items, n_neg=4):
3 """For each positive interaction, sample n_neg random negatives."""
4 negatives = []
5 for _ in range(n_neg):
6 neg = np.random.randint(0, n_items)
7 while neg in positive_items:
8 neg = np.random.randint(0, n_items)
9 negatives.append(neg)
10 return negatives
11
12# BPR Loss (Bayesian Personalized Ranking)
13def bpr_loss(pos_score, neg_score):
14 return -torch.log(torch.sigmoid(pos_score - neg_score)).mean()

📝 Quiz

  1. NCF khác Matrix Factorization ở điểm nào chính?

    • NCF nhanh hơn
    • NCF dùng MLP để capture non-linear interactions
    • NCF không cần embeddings
    • Matrix Factorization chính xác hơn
  2. Two-Tower architecture phù hợp production vì?

    • Accuracy cao nhất
    • Đơn giản nhất để implement
    • Item embeddings precompute offline, serving nhanh với ANN search
    • Không cần training
  3. Sequential RecSys phù hợp cho?

    • Movie rating prediction
    • Predict next click/purchase based on browsing history
    • Cold-start users
    • Chỉ explicit feedback

🎯 Key Takeaways

  1. NCF — Non-linear CF, replaces dot product with MLP
  2. Two-Tower — Production standard cho large-scale retrieval
  3. SASRec — Transformer cho sequential patterns
  4. FAISS — ANN search cho real-time serving
  5. Negative sampling — Essential cho implicit feedback training

🚀 Bài tiếp theo

MLOps Fundamentals — Đưa ML models vào production với MLOps practices!