MinAI - Về trang chủ
Hướng dẫn
5/1335 phút
Đang tải...

Transfer Learning & Fine-tuning

Tận dụng pre-trained models - BERT, GPT, ResNet cho domain cụ thể

Transfer Learning & Fine-tuning

Transfer Learning cho phép tận dụng knowledge từ models đã train trên dữ liệu khổng lồ cho domain riêng, tiết kiệm data và compute.

🎯 Mục tiêu

  • Hiểu transfer learning concept
  • Fine-tune pre-trained NLP models (BERT)
  • Fine-tune pre-trained Vision models (ResNet)
  • Few-shot và Zero-shot learning

1. Transfer Learning Concept

1.1 Tại sao Transfer Learning?

Train from scratchTransfer Learning
Cần hàng triệu samplesChỉ cần vài trăm → vài nghìn
Train vài ngày trên GPUFine-tune vài giờ
Chi phí caoChi phí thấp
Khó đạt SOTAGần SOTA dễ dàng

1.2 How It Works

Pre-trained Model (ImageNet/Wikipedia)

❄️Layer 1-6: General features edges, textures, shapes, grammar
🔧Layer 7-12: Task-specific features object parts, sentiment patterns
🔄Final Layer: Classification 1000 ImageNet → 5 your classes

1.3 Strategies

StrategyWhenHow
Feature ExtractionSmall data, similar domainFreeze all, train new head
Fine-tuning (top)Medium dataFreeze early, train top layers
Fine-tuning (full)Large data, different domainTrain all with small LR

2. NLP Transfer Learning

2.1 BERT Fine-tuning (Text Classification)

Python
1from transformers import (
2 AutoTokenizer, AutoModelForSequenceClassification,
3 TrainingArguments, Trainer
4)
5from datasets import Dataset
6import pandas as pd
7
8# Load pre-trained model
9model_name = "bert-base-multilingual-cased"
10tokenizer = AutoTokenizer.from_pretrained(model_name)
11model = AutoModelForSequenceClassification.from_pretrained(
12 model_name, num_labels=3 # 3 classes: positive, neutral, negative
13)
14
15# Prepare data
16df = pd.read_csv("reviews.csv") # columns: text, label
17dataset = Dataset.from_pandas(df)
18
19def tokenize(batch):
20 return tokenizer(batch["text"], padding=True, truncation=True, max_length=256)
21
22dataset = dataset.map(tokenize, batched=True)
23dataset = dataset.train_test_split(test_size=0.2)
24
25# Training
26training_args = TrainingArguments(
27 output_dir="./results",
28 num_train_epochs=3,
29 per_device_train_batch_size=16,
30 per_device_eval_batch_size=32,
31 learning_rate=2e-5, # Small LR cho fine-tuning!
32 weight_decay=0.01,
33 evaluation_strategy="epoch",
34 save_strategy="epoch",
35 load_best_model_at_end=True,
36)
37
38trainer = Trainer(
39 model=model,
40 args=training_args,
41 train_dataset=dataset["train"],
42 eval_dataset=dataset["test"],
43)
44
45trainer.train()

2.2 Vietnamese NLP with PhoBERT

Python
1from transformers import AutoModel, AutoTokenizer
2
3# PhoBERT - Pre-trained on Vietnamese text
4model_name = "vinai/phobert-base-v2"
5tokenizer = AutoTokenizer.from_pretrained(model_name)
6model = AutoModel.from_pretrained(model_name)
7
8# Tokenize Vietnamese text
9text = "Sản phẩm này rất tốt, giao hàng nhanh"
10tokens = tokenizer(text, return_tensors="pt")
11outputs = model(**tokens)
12# outputs.last_hidden_state → embeddings for downstream tasks

3. Vision Transfer Learning

3.1 ResNet Fine-tuning (Image Classification)

Python
1import torch
2import torch.nn as nn
3from torchvision import models, transforms
4from torch.utils.data import DataLoader
5
6# Load pre-trained ResNet50
7model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V2)
8
9# Freeze early layers
10for param in model.parameters():
11 param.requires_grad = False
12
13# Replace final layer
14num_classes = 5 # Your classes
15model.fc = nn.Sequential(
16 nn.Linear(model.fc.in_features, 256),
17 nn.ReLU(),
18 nn.Dropout(0.3),
19 nn.Linear(256, num_classes)
20)
21
22# Only train new layers
23optimizer = torch.optim.Adam(model.fc.parameters(), lr=1e-3)
24criterion = nn.CrossEntropyLoss()

3.2 Data Augmentation for Fine-tuning

Python
1train_transform = transforms.Compose([
2 transforms.RandomResizedCrop(224),
3 transforms.RandomHorizontalFlip(),
4 transforms.RandomRotation(15),
5 transforms.ColorJitter(brightness=0.2, contrast=0.2),
6 transforms.ToTensor(),
7 transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
8])
9
10val_transform = transforms.Compose([
11 transforms.Resize(256),
12 transforms.CenterCrop(224),
13 transforms.ToTensor(),
14 transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
15])

3.3 Progressive Unfreezing

Python
1# Phase 1: Train head only (5 epochs)
2for param in model.parameters():
3 param.requires_grad = False
4for param in model.fc.parameters():
5 param.requires_grad = True
6# Train with lr=1e-3
7
8# Phase 2: Unfreeze last block (5 epochs)
9for param in model.layer4.parameters():
10 param.requires_grad = True
11# Train with lr=1e-4 (smaller!)
12
13# Phase 3: Unfreeze all (3 epochs)
14for param in model.parameters():
15 param.requires_grad = True
16# Train with lr=1e-5 (very small!)

4. Few-shot & Zero-shot Learning

4.1 Few-shot Learning

Classify với chỉ 5-10 samples per class:

Python
1from sentence_transformers import SentenceTransformer
2from sklearn.neighbors import KNeighborsClassifier
3
4# Sentence embeddings
5model = SentenceTransformer('all-MiniLM-L6-v2')
6
7# Few examples per class
8examples = [
9 ("Great product!", "positive"),
10 ("Love it!", "positive"),
11 ("Terrible quality", "negative"),
12 ("Waste of money", "negative"),
13 ("It's okay", "neutral"),
14]
15
16texts, labels = zip(*examples)
17embeddings = model.encode(list(texts))
18
19# KNN classifier on embeddings
20knn = KNeighborsClassifier(n_neighbors=3)
21knn.fit(embeddings, labels)
22
23# Classify new text
24new_text = "Absolutely amazing experience"
25new_embed = model.encode([new_text])
26print(knn.predict(new_embed)) # "positive"

4.2 Zero-shot Classification

Python
1from transformers import pipeline
2
3classifier = pipeline("zero-shot-classification",
4 model="facebook/bart-large-mnli")
5
6text = "Chiếc điện thoại này pin rất trâu"
7labels = ["tích cực", "tiêu cực", "trung lập"]
8
9result = classifier(text, candidate_labels=labels)
10print(result)
11# {'labels': ['tích cực', 'trung lập', 'tiêu cực'],
12# 'scores': [0.85, 0.12, 0.03]}

5. Best Practices

5.1 Fine-tuning Checklist

StepDetail
1. Choose base modelMatch domain (NLP/Vision/Tabular)
2. Prepare dataClean, labeled, balanced
3. Learning rateStart small (2e-5 for BERT, 1e-3 for head)
4. Freeze strategyStart frozen, progressively unfreeze
5. RegularizationDropout, weight decay
6. EvaluationHold-out + cross-validation
7. Early stoppingMonitor val loss, patience=3

5.2 Common LR Schedule

Python
1from transformers import get_linear_schedule_with_warmup
2
3scheduler = get_linear_schedule_with_warmup(
4 optimizer,
5 num_warmup_steps=100, # Warm up slowly
6 num_training_steps=1000 # Then decay
7)

📝 Quiz

  1. Transfer Learning hiệu quả nhất khi?

    • Data rất nhiều
    • Data ít nhưng domain tương tự pre-trained model
    • Task hoàn toàn khác
    • Không có GPU
  2. Fine-tuning nên dùng learning rate nào?

    • Lớn (0.1)
    • Rất nhỏ (1e-5 đến 2e-5)
    • Random
    • Bằng pre-training LR
  3. Zero-shot classification là gì?

    • Classify mà không cần training examples cho classes mới
    • Accuracy bằng 0
    • Model chưa train
    • Không dùng được

🎯 Key Takeaways

  1. Transfer Learning tiết kiệm data + compute đáng kể
  2. Freeze → Unfreeze progressively cho best results
  3. Small LR critical cho fine-tuning
  4. PhoBERT — best choice cho Vietnamese NLP
  5. Few-shot/Zero-shot — powerful khi data rất ít

🚀 Bài tiếp theo

Recommendation Systems Overview — Hệ thống gợi ý: Collaborative Filtering, Content-based, và Hybrid!