🎯 Mục tiêu bài học
Sau bài này, bạn sẽ:
✅ Dùng ImageNet pretrained models (ResNet, EfficientNet)
✅ Sử dụng Keras Applications và Hugging Face
✅ Biết cách chọn model phù hợp
✅ Xây dựng Vision pipeline hoàn chỉnh
Ôn lại bài trước
Bài 17 học Transfer Learning cho NLP. Hôm nay áp dụng cho Computer Vision!
🖼️ Vision Transfer Learning Overview
Pretrained Vision Models
ImageNet pretrained models đã học:
- Edges, textures (low-level)
- Shapes, patterns (mid-level)
- Object parts, objects (high-level)
→ Transfer tốt cho hầu hết vision tasks!
Popular Pretrained Models
| Model | Year | Top-1 Acc | Parameters | Speed |
|---|---|---|---|---|
| ResNet50 | 2015 | 76.1% | 25M | Fast |
| VGG16 | 2014 | 71.3% | 138M | Slow |
| InceptionV3 | 2015 | 77.9% | 24M | Medium |
| EfficientNetB0 | 2019 | 77.1% | 5M | Fast |
| ViT-Base | 2020 | 81.8% | 86M | Medium |
Checkpoint
Bạn đã biết các pretrained vision models phổ biến?
🔧 Keras Applications
Load Pretrained Model
1import tensorflow as tf2from tensorflow.keras.applications import (3 ResNet50,4 VGG16,5 InceptionV3,6 EfficientNetB0,7 MobileNetV28)9from tensorflow.keras import layers, Model1011# Load ResNet50 without top (classifier)12base_model = ResNet50(13 weights='imagenet', # Pretrained weights14 include_top=False, # Remove classifier15 input_shape=(224, 224, 3)16)1718print(f"ResNet50 layers: {len(base_model.layers)}")19print(f"Output shape: {base_model.output_shape}")2021# Other models22models = {23 'vgg16': VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3)),24 'inception': InceptionV3(weights='imagenet', include_top=False, input_shape=(299, 299, 3)),25 'efficientnet': EfficientNetB0(weights='imagenet', include_top=False, input_shape=(224, 224, 3)),26 'mobilenet': MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3)),27}2829for name, model in models.items():30 print(f"{name}: {len(model.layers)} layers, {model.count_params():,} params")1ResNet50 layers: 1752Output shape: (None, 7, 7, 2048)3vgg16: 19 layers, 14,714,688 params4inception: 311 layers, 21,802,784 params5efficientnet: 237 layers, 4,049,571 params6mobilenet: 155 layers, 2,257,984 paramsCheckpoint
Bạn đã biết cách load pretrained models?
🏗️ Build Custom Classifier
Feature Extraction Approach
1from tensorflow.keras.applications import ResNet502from tensorflow.keras import layers, Model34# Load base model5base_model = ResNet50(6 weights='imagenet',7 include_top=False,8 input_shape=(224, 224, 3)9)1011# Freeze base model12base_model.trainable = False1314# Build classifier15inputs = layers.Input(shape=(224, 224, 3))16x = base_model(inputs, training=False) # training=False for BatchNorm1718# Global Average Pooling19x = layers.GlobalAveragePooling2D()(x)2021# Custom classifier22x = layers.Dense(256, activation='relu')(x)23x = layers.Dropout(0.5)(x)24outputs = layers.Dense(10, activation='softmax')(x) # 10 classes2526model = Model(inputs, outputs)2728model.summary()2930# Check trainable params31trainable_params = sum([tf.size(w).numpy() for w in model.trainable_weights])32total_params = model.count_params()33print(f"\nTrainable: {trainable_params:,} / {total_params:,} ({100*trainable_params/total_params:.1f}%)")1Model: "model"2_________________________________________________________________3 Layer (type) Output Shape Param # 4=================================================================5 input_1 (InputLayer) [(None, 224, 224, 3)] 0 6 resnet50 (Functional) (None, 7, 7, 2048) 23,587,7127 global_average_pooling2d (None, 2048) 0 8 dense (Dense) (None, 256) 524,544 9 dropout (Dropout) (None, 256) 0 10 dense_1 (Dense) (None, 10) 2,570 11=================================================================12Total params: 24,114,82613Trainable: 527,114 / 24,114,826 (2.2%)Compile and Train
1# Compile2model.compile(3 optimizer='adam',4 loss='categorical_crossentropy',5 metrics=['accuracy']6)78# Data augmentation9data_augmentation = tf.keras.Sequential([10 layers.RandomFlip("horizontal"),11 layers.RandomRotation(0.1),12 layers.RandomZoom(0.1),13])1415# Preprocess function for ResNet16from tensorflow.keras.applications.resnet50 import preprocess_input1718def preprocess(images, labels):19 # Data augmentation20 images = data_augmentation(images)21 # ResNet preprocessing22 images = preprocess_input(images)23 return images, labels2425# Train26# model.fit(train_data.map(preprocess), epochs=10, validation_data=val_data)Checkpoint
Bạn đã xây dựng được classifier với Feature Extraction?
🔓 Fine-tuning Vision Models
Unfreeze Strategy
1from tensorflow.keras.applications import ResNet502from tensorflow.keras import layers, Model3import tensorflow as tf45def build_model_for_finetuning(num_classes=10):6 """Build model with frozen base, ready for fine-tuning"""7 8 base_model = ResNet50(9 weights='imagenet',10 include_top=False,11 input_shape=(224, 224, 3)12 )13 14 # Initially freeze all15 base_model.trainable = False16 17 # Build model18 inputs = layers.Input(shape=(224, 224, 3))19 x = base_model(inputs, training=False)20 x = layers.GlobalAveragePooling2D()(x)21 x = layers.Dense(256, activation='relu')(x)22 x = layers.Dropout(0.5)(x)23 outputs = layers.Dense(num_classes, activation='softmax')(x)24 25 return Model(inputs, outputs), base_model262728# Stage 1: Train classifier only29model, base_model = build_model_for_finetuning(num_classes=10)3031model.compile(32 optimizer=tf.keras.optimizers.Adam(1e-3),33 loss='categorical_crossentropy',34 metrics=['accuracy']35)3637print("Stage 1: Training classifier (base frozen)")38# model.fit(train_data, epochs=5, validation_data=val_data)394041# Stage 2: Fine-tune top layers42base_model.trainable = True4344# Freeze first 100 layers (keep early features frozen)45for layer in base_model.layers[:100]:46 layer.trainable = False4748# Lower learning rate for fine-tuning49model.compile(50 optimizer=tf.keras.optimizers.Adam(1e-5), # 100x smaller51 loss='categorical_crossentropy',52 metrics=['accuracy']53)5455# Count trainable56trainable = sum([tf.size(w).numpy() for w in model.trainable_weights])57print(f"\nStage 2: Fine-tuning (trainable params: {trainable:,})")58# model.fit(train_data, epochs=5, validation_data=val_data)Which layers to unfreeze?
Unfreezing guide:
- Similar domain (photos → photos): Unfreeze top layers only
- Different domain (photos → medical): Unfreeze more layers
- Small data: Keep more frozen
- Large data: Unfreeze more
1# Gradual unfreezing schedule2def gradual_unfreeze(base_model, stage):3 """4 Stage 0: All frozen5 Stage 1: Last block unfrozen6 Stage 2: Last 2 blocks7 Stage 3: All unfrozen8 """9 base_model.trainable = True10 11 # Find layer names for ResNet blocks12 block_prefixes = ['conv5', 'conv4', 'conv3', 'conv2']13 14 if stage == 0:15 base_model.trainable = False16 else:17 for layer in base_model.layers:18 # Freeze all by default19 layer.trainable = False20 21 # Unfreeze based on stage22 for i, prefix in enumerate(block_prefixes[:stage]):23 if prefix in layer.name:24 layer.trainable = True25 break2627# Example usage28for stage in range(4):29 gradual_unfreeze(base_model, stage)30 trainable_count = sum([1 for l in base_model.layers if l.trainable])31 print(f"Stage {stage}: {trainable_count} trainable layers")Checkpoint
Bạn đã hiểu cách fine-tune vision models?
🤗 Hugging Face Vision
Vision Transformer (ViT)
1from transformers import (2 ViTFeatureExtractor,3 ViTForImageClassification,4 TFViTForImageClassification5)6from PIL import Image7import requests89# Load pretrained ViT10model_name = "google/vit-base-patch16-224"1112feature_extractor = ViTFeatureExtractor.from_pretrained(model_name)13model = TFViTForImageClassification.from_pretrained(model_name)1415# Load image16url = "http://images.cocodataset.org/val2017/000000039769.jpg"17image = Image.open(requests.get(url, stream=True).raw)1819# Preprocess20inputs = feature_extractor(images=image, return_tensors="tf")2122# Predict23outputs = model(**inputs)24logits = outputs.logits2526# Get predicted class27predicted_class = logits.numpy().argmax(-1)[0]28print(f"Predicted class: {model.config.id2label[predicted_class]}")Fine-tune ViT
1from transformers import (2 TFViTForImageClassification,3 ViTFeatureExtractor,4 TrainingArguments5)6from datasets import load_dataset7import tensorflow as tf89# Load dataset10dataset = load_dataset("beans") # Small dataset for demo1112# Feature extractor13feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")1415# Preprocess function16def preprocess(examples):17 images = [img.convert("RGB") for img in examples["image"]]18 inputs = feature_extractor(images=images, return_tensors="np")19 inputs["labels"] = examples["labels"]20 return inputs2122# Apply preprocessing23processed_dataset = dataset.with_transform(preprocess)2425# Load model for our number of classes26model = TFViTForImageClassification.from_pretrained(27 "google/vit-base-patch16-224",28 num_labels=3, # beans has 3 classes29 ignore_mismatched_sizes=True30)3132# Training with TensorFlow33train_ds = processed_dataset["train"].to_tf_dataset(34 columns=["pixel_values"],35 label_cols=["labels"],36 shuffle=True,37 batch_size=838)3940model.compile(41 optimizer=tf.keras.optimizers.Adam(2e-5),42 loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),43 metrics=['accuracy']44)4546# model.fit(train_ds, epochs=3)Checkpoint
Bạn đã biết cách dùng Hugging Face cho Vision?
📚 Model Hub & timm
timm Library
timm (PyTorch Image Models) là thư viện với 800+ pretrained vision models.
- Hầu hết SOTA models
- Consistent API
- Easy to use
1# pip install timm23import timm45# List available models6print(f"Total models: {len(timm.list_models())}")78# Search models9resnet_models = timm.list_models('resnet*')10print(f"ResNet variants: {len(resnet_models)}")1112efficientnet_models = timm.list_models('efficientnet*')13print(f"EfficientNet variants: {len(efficientnet_models)}")1415# Load model16model = timm.create_model(17 'efficientnet_b0',18 pretrained=True,19 num_classes=10 # Custom number of classes20)2122# Model info23data_config = timm.data.resolve_model_data_config(model)24print(f"Input size: {data_config['input_size']}")25print(f"Mean: {data_config['mean']}")26print(f"Std: {data_config['std']}")2728# Get feature extractor (without classifier)29feature_extractor = timm.create_model(30 'efficientnet_b0',31 pretrained=True,32 num_classes=0 # Remove classifier33)34print(f"Feature dim: {feature_extractor.num_features}")Popular timm Models
| Model Family | Best Variant | Top-1 Acc | Use Case |
|---|---|---|---|
| EfficientNet | efficientnet_b7 | 84.4% | Balance speed/accuracy |
| ConvNeXt | convnext_large | 87.5% | Best CNN |
| ViT | vit_large_patch16 | 87.8% | Transformer |
| Swin | swin_large_patch4 | 87.3% | Efficient Transformer |
| RegNet | regnetx_320 | 79.9% | Fast inference |
1# Modern architectures2import timm34# ConvNeXt - Modern CNN (2022)5convnext = timm.create_model('convnext_tiny', pretrained=True, num_classes=10)67# Swin Transformer (2021)8swin = timm.create_model('swin_tiny_patch4_window7_224', pretrained=True, num_classes=10)910# MaxViT (2022)11maxvit = timm.create_model('maxvit_tiny_tf_224', pretrained=True, num_classes=10)1213# Compare parameter counts14for name, model in [('ConvNeXt-T', convnext), ('Swin-T', swin), ('MaxViT-T', maxvit)]:15 params = sum(p.numel() for p in model.parameters())16 print(f"{name}: {params/1e6:.1f}M params")Checkpoint
Bạn đã biết về timm library?
🎯 Complete Vision Transfer Pipeline
Full Pipeline
1import tensorflow as tf2from tensorflow.keras.applications import EfficientNetB03from tensorflow.keras.applications.efficientnet import preprocess_input4from tensorflow.keras import layers, Model5from tensorflow.keras.callbacks import (6 EarlyStopping, 7 ModelCheckpoint, 8 ReduceLROnPlateau9)1011def create_transfer_model(num_classes, fine_tune_layers=0):12 """13 Create transfer learning model14 15 Args:16 num_classes: Number of output classes17 fine_tune_layers: Number of layers to unfreeze from top18 (0 = feature extraction only)19 """20 # Base model21 base = EfficientNetB0(22 weights='imagenet',23 include_top=False,24 input_shape=(224, 224, 3)25 )26 27 # Freeze/unfreeze28 if fine_tune_layers == 0:29 base.trainable = False30 else:31 base.trainable = True32 for layer in base.layers[:-fine_tune_layers]:33 layer.trainable = False34 35 # Data augmentation36 augmentation = tf.keras.Sequential([37 layers.RandomFlip("horizontal"),38 layers.RandomRotation(0.2),39 layers.RandomZoom(0.2),40 layers.RandomContrast(0.2),41 ], name='augmentation')42 43 # Build model44 inputs = layers.Input(shape=(224, 224, 3))45 x = augmentation(inputs)46 x = preprocess_input(x)47 x = base(x, training=False)48 x = layers.GlobalAveragePooling2D()(x)49 x = layers.Dense(256, activation='relu')(x)50 x = layers.Dropout(0.5)(x)51 outputs = layers.Dense(num_classes, activation='softmax')(x)52 53 return Model(inputs, outputs)545556def train_transfer_model(train_data, val_data, num_classes):57 """Two-stage training: feature extraction then fine-tuning"""58 59 # Stage 1: Feature Extraction60 print("=" * 50)61 print("Stage 1: Feature Extraction")62 print("=" * 50)63 64 model = create_transfer_model(num_classes, fine_tune_layers=0)65 66 model.compile(67 optimizer=tf.keras.optimizers.Adam(1e-3),68 loss='categorical_crossentropy',69 metrics=['accuracy']70 )71 72 callbacks = [73 EarlyStopping(patience=5, restore_best_weights=True),74 ReduceLROnPlateau(factor=0.2, patience=3)75 ]76 77 model.fit(78 train_data,79 validation_data=val_data,80 epochs=20,81 callbacks=callbacks82 )83 84 # Stage 2: Fine-tuning85 print("\n" + "=" * 50)86 print("Stage 2: Fine-tuning")87 print("=" * 50)88 89 model = create_transfer_model(num_classes, fine_tune_layers=50)90 91 model.compile(92 optimizer=tf.keras.optimizers.Adam(1e-5),93 loss='categorical_crossentropy',94 metrics=['accuracy']95 )96 97 callbacks = [98 EarlyStopping(patience=5, restore_best_weights=True),99 ModelCheckpoint('best_model.keras', save_best_only=True),100 ReduceLROnPlateau(factor=0.2, patience=3)101 ]102 103 model.fit(104 train_data,105 validation_data=val_data,106 epochs=20,107 callbacks=callbacks108 )109 110 return model111112113# Usage example:114# model = train_transfer_model(train_ds, val_ds, num_classes=10)Checkpoint
Bạn có thể xây dựng complete vision transfer pipeline?
🎯 Tổng kết Module Transfer Learning
Key Takeaways
| Aspect | NLP | Vision |
|---|---|---|
| Library | Hugging Face | Keras Apps / timm |
| Models | BERT, GPT, T5 | ResNet, ViT, EfficientNet |
| Preprocessing | Tokenization | Resize + Normalize |
| Input | Token IDs | Image tensors |
Transfer Learning Decision Tree
Model Selection
NLP:
- Speed: DistilBERT, MiniLM
- Accuracy: RoBERTa, DeBERTa
- Generation: GPT-2, T5
Vision:
- Speed: MobileNet, EfficientNet-B0
- Accuracy: EfficientNet-B7, ConvNeXt
- Transformer: ViT, Swin
Next Steps
Module tiếp theo: Optimization & Deployment
- Techniques tối ưu training
- Model compression
- Deployment strategies
🎉 Hoàn thành Module Transfer Learning! Bạn đã master cả NLP và Vision transfer learning.
