Transfer Learning & Fine-tuning
Transfer Learning cho phép tận dụng knowledge từ models đã train trên dữ liệu khổng lồ cho domain riêng, tiết kiệm data và compute.
🎯 Mục tiêu
- Hiểu transfer learning concept
- Fine-tune pre-trained NLP models (BERT)
- Fine-tune pre-trained Vision models (ResNet)
- Few-shot và Zero-shot learning
1. Transfer Learning Concept
1.1 Tại sao Transfer Learning?
| Train from scratch | Transfer Learning |
|---|---|
| Cần hàng triệu samples | Chỉ cần vài trăm → vài nghìn |
| Train vài ngày trên GPU | Fine-tune vài giờ |
| Chi phí cao | Chi phí thấp |
| Khó đạt SOTA | Gần SOTA dễ dàng |
1.2 How It Works
Pre-trained Model (ImageNet/Wikipedia)
❄️Layer 1-6: General features
edges, textures, shapes, grammar
🔧Layer 7-12: Task-specific features
object parts, sentiment patterns
🔄Final Layer: Classification
1000 ImageNet → 5 your classes
1.3 Strategies
| Strategy | When | How |
|---|---|---|
| Feature Extraction | Small data, similar domain | Freeze all, train new head |
| Fine-tuning (top) | Medium data | Freeze early, train top layers |
| Fine-tuning (full) | Large data, different domain | Train all with small LR |
2. NLP Transfer Learning
2.1 BERT Fine-tuning (Text Classification)
Python
1from transformers import (2 AutoTokenizer, AutoModelForSequenceClassification,3 TrainingArguments, Trainer4)5from datasets import Dataset6import pandas as pd78# Load pre-trained model9model_name = "bert-base-multilingual-cased"10tokenizer = AutoTokenizer.from_pretrained(model_name)11model = AutoModelForSequenceClassification.from_pretrained(12 model_name, num_labels=3 # 3 classes: positive, neutral, negative13)1415# Prepare data16df = pd.read_csv("reviews.csv") # columns: text, label17dataset = Dataset.from_pandas(df)1819def tokenize(batch):20 return tokenizer(batch["text"], padding=True, truncation=True, max_length=256)2122dataset = dataset.map(tokenize, batched=True)23dataset = dataset.train_test_split(test_size=0.2)2425# Training26training_args = TrainingArguments(27 output_dir="./results",28 num_train_epochs=3,29 per_device_train_batch_size=16,30 per_device_eval_batch_size=32,31 learning_rate=2e-5, # Small LR cho fine-tuning!32 weight_decay=0.01,33 evaluation_strategy="epoch",34 save_strategy="epoch",35 load_best_model_at_end=True,36)3738trainer = Trainer(39 model=model,40 args=training_args,41 train_dataset=dataset["train"],42 eval_dataset=dataset["test"],43)4445trainer.train()2.2 Vietnamese NLP with PhoBERT
Python
1from transformers import AutoModel, AutoTokenizer23# PhoBERT - Pre-trained on Vietnamese text4model_name = "vinai/phobert-base-v2"5tokenizer = AutoTokenizer.from_pretrained(model_name)6model = AutoModel.from_pretrained(model_name)78# Tokenize Vietnamese text9text = "Sản phẩm này rất tốt, giao hàng nhanh"10tokens = tokenizer(text, return_tensors="pt")11outputs = model(**tokens)12# outputs.last_hidden_state → embeddings for downstream tasks3. Vision Transfer Learning
3.1 ResNet Fine-tuning (Image Classification)
Python
1import torch2import torch.nn as nn3from torchvision import models, transforms4from torch.utils.data import DataLoader56# Load pre-trained ResNet507model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V2)89# Freeze early layers10for param in model.parameters():11 param.requires_grad = False1213# Replace final layer14num_classes = 5 # Your classes15model.fc = nn.Sequential(16 nn.Linear(model.fc.in_features, 256),17 nn.ReLU(),18 nn.Dropout(0.3),19 nn.Linear(256, num_classes)20)2122# Only train new layers23optimizer = torch.optim.Adam(model.fc.parameters(), lr=1e-3)24criterion = nn.CrossEntropyLoss()3.2 Data Augmentation for Fine-tuning
Python
1train_transform = transforms.Compose([2 transforms.RandomResizedCrop(224),3 transforms.RandomHorizontalFlip(),4 transforms.RandomRotation(15),5 transforms.ColorJitter(brightness=0.2, contrast=0.2),6 transforms.ToTensor(),7 transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])8])910val_transform = transforms.Compose([11 transforms.Resize(256),12 transforms.CenterCrop(224),13 transforms.ToTensor(),14 transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])15])3.3 Progressive Unfreezing
Python
1# Phase 1: Train head only (5 epochs)2for param in model.parameters():3 param.requires_grad = False4for param in model.fc.parameters():5 param.requires_grad = True6# Train with lr=1e-378# Phase 2: Unfreeze last block (5 epochs)9for param in model.layer4.parameters():10 param.requires_grad = True11# Train with lr=1e-4 (smaller!)1213# Phase 3: Unfreeze all (3 epochs)14for param in model.parameters():15 param.requires_grad = True16# Train with lr=1e-5 (very small!)4. Few-shot & Zero-shot Learning
4.1 Few-shot Learning
Classify với chỉ 5-10 samples per class:
Python
1from sentence_transformers import SentenceTransformer2from sklearn.neighbors import KNeighborsClassifier34# Sentence embeddings5model = SentenceTransformer('all-MiniLM-L6-v2')67# Few examples per class8examples = [9 ("Great product!", "positive"),10 ("Love it!", "positive"),11 ("Terrible quality", "negative"),12 ("Waste of money", "negative"),13 ("It's okay", "neutral"),14]1516texts, labels = zip(*examples)17embeddings = model.encode(list(texts))1819# KNN classifier on embeddings20knn = KNeighborsClassifier(n_neighbors=3)21knn.fit(embeddings, labels)2223# Classify new text24new_text = "Absolutely amazing experience"25new_embed = model.encode([new_text])26print(knn.predict(new_embed)) # "positive"4.2 Zero-shot Classification
Python
1from transformers import pipeline23classifier = pipeline("zero-shot-classification",4 model="facebook/bart-large-mnli")56text = "Chiếc điện thoại này pin rất trâu"7labels = ["tích cực", "tiêu cực", "trung lập"]89result = classifier(text, candidate_labels=labels)10print(result)11# {'labels': ['tích cực', 'trung lập', 'tiêu cực'],12# 'scores': [0.85, 0.12, 0.03]}5. Best Practices
5.1 Fine-tuning Checklist
| Step | Detail |
|---|---|
| 1. Choose base model | Match domain (NLP/Vision/Tabular) |
| 2. Prepare data | Clean, labeled, balanced |
| 3. Learning rate | Start small (2e-5 for BERT, 1e-3 for head) |
| 4. Freeze strategy | Start frozen, progressively unfreeze |
| 5. Regularization | Dropout, weight decay |
| 6. Evaluation | Hold-out + cross-validation |
| 7. Early stopping | Monitor val loss, patience=3 |
5.2 Common LR Schedule
Python
1from transformers import get_linear_schedule_with_warmup23scheduler = get_linear_schedule_with_warmup(4 optimizer,5 num_warmup_steps=100, # Warm up slowly6 num_training_steps=1000 # Then decay7)📝 Quiz
-
Transfer Learning hiệu quả nhất khi?
- Data rất nhiều
- Data ít nhưng domain tương tự pre-trained model
- Task hoàn toàn khác
- Không có GPU
-
Fine-tuning nên dùng learning rate nào?
- Lớn (0.1)
- Rất nhỏ (1e-5 đến 2e-5)
- Random
- Bằng pre-training LR
-
Zero-shot classification là gì?
- Classify mà không cần training examples cho classes mới
- Accuracy bằng 0
- Model chưa train
- Không dùng được
🎯 Key Takeaways
- Transfer Learning tiết kiệm data + compute đáng kể
- Freeze → Unfreeze progressively cho best results
- Small LR critical cho fine-tuning
- PhoBERT — best choice cho Vietnamese NLP
- Few-shot/Zero-shot — powerful khi data rất ít
🚀 Bài tiếp theo
Recommendation Systems Overview — Hệ thống gợi ý: Collaborative Filtering, Content-based, và Hybrid!
