Vision Transfer Learning & Model Hub

🎯 Mục tiêu bài học

TB5 min

Sau bài này, bạn sẽ:

✅ Dùng ImageNet pretrained models (ResNet, EfficientNet)

✅ Sử dụng Keras Applications và Hugging Face

✅ Biết cách chọn model phù hợp

✅ Xây dựng Vision pipeline hoàn chỉnh

Ôn lại bài trước

Bài 17 học Transfer Learning cho NLP. Hôm nay áp dụng cho Computer Vision!

Task 0

🖼️ Vision Transfer Learning Overview

TB5 min

Pretrained Vision Models

ImageNet pretrained models đã học:

Edges, textures (low-level)
Shapes, patterns (mid-level)
Object parts, objects (high-level)

→ Transfer tốt cho hầu hết vision tasks!

Popular Pretrained Models

Model	Year	Top-1 Acc	Parameters	Speed
ResNet50	2015	76.1%	25M	Fast
VGG16	2014	71.3%	138M	Slow
InceptionV3	2015	77.9%	24M	Medium
EfficientNetB0	2019	77.1%	5M	Fast
ViT-Base	2020	81.8%	86M	Medium

Checkpoint

Bạn đã biết các pretrained vision models phổ biến?

Task 1

🔧 Keras Applications

TB5 min

Load Pretrained Model

python.py

1import tensorflow as tf
2from tensorflow.keras.applications import (
3    ResNet50,
4    VGG16,
5    InceptionV3,
6    EfficientNetB0,
7    MobileNetV2
8)
9from tensorflow.keras import layers, Model
10
11# Load ResNet50 without top (classifier)
12base_model = ResNet50(
13    weights='imagenet',      # Pretrained weights
14    include_top=False,       # Remove classifier
15    input_shape=(224, 224, 3)
16)
17
18print(f"ResNet50 layers: {len(base_model.layers)}")
19print(f"Output shape: {base_model.output_shape}")
20
21# Other models
22models = {
23    'vgg16': VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3)),
24    'inception': InceptionV3(weights='imagenet', include_top=False, input_shape=(299, 299, 3)),
25    'efficientnet': EfficientNetB0(weights='imagenet', include_top=False, input_shape=(224, 224, 3)),
26    'mobilenet': MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3)),
27}
28
29for name, model in models.items():
30    print(f"{name}: {len(model.layers)} layers, {model.count_params():,} params")

Expected Output

1ResNet50 layers: 175
2Output shape: (None, 7, 7, 2048)
3vgg16: 19 layers, 14,714,688 params
4inception: 311 layers, 21,802,784 params
5efficientnet: 237 layers, 4,049,571 params
6mobilenet: 155 layers, 2,257,984 params

Checkpoint

Bạn đã biết cách load pretrained models?

Task 2

🏗️ Build Custom Classifier

TB5 min

Feature Extraction Approach

python.py

1from tensorflow.keras.applications import ResNet50
2from tensorflow.keras import layers, Model
3
4# Load base model
5base_model = ResNet50(
6    weights='imagenet',
7    include_top=False,
8    input_shape=(224, 224, 3)
9)
10
11# Freeze base model
12base_model.trainable = False
13
14# Build classifier
15inputs = layers.Input(shape=(224, 224, 3))
16x = base_model(inputs, training=False)  # training=False for BatchNorm
17
18# Global Average Pooling
19x = layers.GlobalAveragePooling2D()(x)
20
21# Custom classifier
22x = layers.Dense(256, activation='relu')(x)
23x = layers.Dropout(0.5)(x)
24outputs = layers.Dense(10, activation='softmax')(x)  # 10 classes
25
26model = Model(inputs, outputs)
27
28model.summary()
29
30# Check trainable params
31trainable_params = sum([tf.size(w).numpy() for w in model.trainable_weights])
32total_params = model.count_params()
33print(f"\nTrainable: {trainable_params:,} / {total_params:,} ({100*trainable_params/total_params:.1f}%)")

Expected Output

1Model: "model"
2_________________________________________________________________
3 Layer (type)                Output Shape              Param #   
4=================================================================
5 input_1 (InputLayer)        [(None, 224, 224, 3)]     0         
6 resnet50 (Functional)       (None, 7, 7, 2048)        23,587,712
7 global_average_pooling2d    (None, 2048)              0         
8 dense (Dense)               (None, 256)               524,544   
9 dropout (Dropout)           (None, 256)               0         
10 dense_1 (Dense)             (None, 10)                2,570     
11=================================================================
12Total params: 24,114,826
13Trainable: 527,114 / 24,114,826 (2.2%)

Compile and Train

python.py

1# Compile
2model.compile(
3    optimizer='adam',
4    loss='categorical_crossentropy',
5    metrics=['accuracy']
6)
7
8# Data augmentation
9data_augmentation = tf.keras.Sequential([
10    layers.RandomFlip("horizontal"),
11    layers.RandomRotation(0.1),
12    layers.RandomZoom(0.1),
13])
14
15# Preprocess function for ResNet
16from tensorflow.keras.applications.resnet50 import preprocess_input
17
18def preprocess(images, labels):
19    # Data augmentation
20    images = data_augmentation(images)
21    # ResNet preprocessing
22    images = preprocess_input(images)
23    return images, labels
24
25# Train
26# model.fit(train_data.map(preprocess), epochs=10, validation_data=val_data)

Checkpoint

Bạn đã xây dựng được classifier với Feature Extraction?

Task 3

🔓 Fine-tuning Vision Models

TB5 min

Unfreeze Strategy

python.py

1from tensorflow.keras.applications import ResNet50
2from tensorflow.keras import layers, Model
3import tensorflow as tf
4
5def build_model_for_finetuning(num_classes=10):
6    """Build model with frozen base, ready for fine-tuning"""
7    
8    base_model = ResNet50(
9        weights='imagenet',
10        include_top=False,
11        input_shape=(224, 224, 3)
12    )
13    
14    # Initially freeze all
15    base_model.trainable = False
16    
17    # Build model
18    inputs = layers.Input(shape=(224, 224, 3))
19    x = base_model(inputs, training=False)
20    x = layers.GlobalAveragePooling2D()(x)
21    x = layers.Dense(256, activation='relu')(x)
22    x = layers.Dropout(0.5)(x)
23    outputs = layers.Dense(num_classes, activation='softmax')(x)
24    
25    return Model(inputs, outputs), base_model
26
27
28# Stage 1: Train classifier only
29model, base_model = build_model_for_finetuning(num_classes=10)
30
31model.compile(
32    optimizer=tf.keras.optimizers.Adam(1e-3),
33    loss='categorical_crossentropy',
34    metrics=['accuracy']
35)
36
37print("Stage 1: Training classifier (base frozen)")
38# model.fit(train_data, epochs=5, validation_data=val_data)
39
40
41# Stage 2: Fine-tune top layers
42base_model.trainable = True
43
44# Freeze first 100 layers (keep early features frozen)
45for layer in base_model.layers[:100]:
46    layer.trainable = False
47
48# Lower learning rate for fine-tuning
49model.compile(
50    optimizer=tf.keras.optimizers.Adam(1e-5),  # 100x smaller
51    loss='categorical_crossentropy',
52    metrics=['accuracy']
53)
54
55# Count trainable
56trainable = sum([tf.size(w).numpy() for w in model.trainable_weights])
57print(f"\nStage 2: Fine-tuning (trainable params: {trainable:,})")
58# model.fit(train_data, epochs=5, validation_data=val_data)

Which layers to unfreeze?

Unfreezing guide:

Similar domain (photos → photos): Unfreeze top layers only
Different domain (photos → medical): Unfreeze more layers
Small data: Keep more frozen
Large data: Unfreeze more

python.py

1# Gradual unfreezing schedule
2def gradual_unfreeze(base_model, stage):
3    """
4    Stage 0: All frozen
5    Stage 1: Last block unfrozen
6    Stage 2: Last 2 blocks
7    Stage 3: All unfrozen
8    """
9    base_model.trainable = True
10    
11    # Find layer names for ResNet blocks
12    block_prefixes = ['conv5', 'conv4', 'conv3', 'conv2']
13    
14    if stage == 0:
15        base_model.trainable = False
16    else:
17        for layer in base_model.layers:
18            # Freeze all by default
19            layer.trainable = False
20            
21            # Unfreeze based on stage
22            for i, prefix in enumerate(block_prefixes[:stage]):
23                if prefix in layer.name:
24                    layer.trainable = True
25                    break
26
27# Example usage
28for stage in range(4):
29    gradual_unfreeze(base_model, stage)
30    trainable_count = sum([1 for l in base_model.layers if l.trainable])
31    print(f"Stage {stage}: {trainable_count} trainable layers")

Checkpoint

Bạn đã hiểu cách fine-tune vision models?

Task 4

🤗 Hugging Face Vision

TB5 min

Vision Transformer (ViT)

python.py

1from transformers import (
2    ViTFeatureExtractor,
3    ViTForImageClassification,
4    TFViTForImageClassification
5)
6from PIL import Image
7import requests
8
9# Load pretrained ViT
10model_name = "google/vit-base-patch16-224"
11
12feature_extractor = ViTFeatureExtractor.from_pretrained(model_name)
13model = TFViTForImageClassification.from_pretrained(model_name)
14
15# Load image
16url = "http://images.cocodataset.org/val2017/000000039769.jpg"
17image = Image.open(requests.get(url, stream=True).raw)
18
19# Preprocess
20inputs = feature_extractor(images=image, return_tensors="tf")
21
22# Predict
23outputs = model(**inputs)
24logits = outputs.logits
25
26# Get predicted class
27predicted_class = logits.numpy().argmax(-1)[0]
28print(f"Predicted class: {model.config.id2label[predicted_class]}")

Fine-tune ViT

python.py

1from transformers import (
2    TFViTForImageClassification,
3    ViTFeatureExtractor,
4    TrainingArguments
5)
6from datasets import load_dataset
7import tensorflow as tf
8
9# Load dataset
10dataset = load_dataset("beans")  # Small dataset for demo
11
12# Feature extractor
13feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
14
15# Preprocess function
16def preprocess(examples):
17    images = [img.convert("RGB") for img in examples["image"]]
18    inputs = feature_extractor(images=images, return_tensors="np")
19    inputs["labels"] = examples["labels"]
20    return inputs
21
22# Apply preprocessing
23processed_dataset = dataset.with_transform(preprocess)
24
25# Load model for our number of classes
26model = TFViTForImageClassification.from_pretrained(
27    "google/vit-base-patch16-224",
28    num_labels=3,  # beans has 3 classes
29    ignore_mismatched_sizes=True
30)
31
32# Training with TensorFlow
33train_ds = processed_dataset["train"].to_tf_dataset(
34    columns=["pixel_values"],
35    label_cols=["labels"],
36    shuffle=True,
37    batch_size=8
38)
39
40model.compile(
41    optimizer=tf.keras.optimizers.Adam(2e-5),
42    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
43    metrics=['accuracy']
44)
45
46# model.fit(train_ds, epochs=3)

Checkpoint

Bạn đã biết cách dùng Hugging Face cho Vision?

Task 5

📚 Model Hub & timm

TB5 min

timm Library

timm (PyTorch Image Models) là thư viện với 800+ pretrained vision models.

Hầu hết SOTA models
Consistent API
Easy to use

python.py

1# pip install timm
2
3import timm
4
5# List available models
6print(f"Total models: {len(timm.list_models())}")
7
8# Search models
9resnet_models = timm.list_models('resnet*')
10print(f"ResNet variants: {len(resnet_models)}")
11
12efficientnet_models = timm.list_models('efficientnet*')
13print(f"EfficientNet variants: {len(efficientnet_models)}")
14
15# Load model
16model = timm.create_model(
17    'efficientnet_b0',
18    pretrained=True,
19    num_classes=10  # Custom number of classes
20)
21
22# Model info
23data_config = timm.data.resolve_model_data_config(model)
24print(f"Input size: {data_config['input_size']}")
25print(f"Mean: {data_config['mean']}")
26print(f"Std: {data_config['std']}")
27
28# Get feature extractor (without classifier)
29feature_extractor = timm.create_model(
30    'efficientnet_b0',
31    pretrained=True,
32    num_classes=0  # Remove classifier
33)
34print(f"Feature dim: {feature_extractor.num_features}")

Popular timm Models

Model Family	Best Variant	Top-1 Acc	Use Case
EfficientNet	efficientnet_b7	84.4%	Balance speed/accuracy
ConvNeXt	convnext_large	87.5%	Best CNN
ViT	vit_large_patch16	87.8%	Transformer
Swin	swin_large_patch4	87.3%	Efficient Transformer
RegNet	regnetx_320	79.9%	Fast inference

python.py

1# Modern architectures
2import timm
3
4# ConvNeXt - Modern CNN (2022)
5convnext = timm.create_model('convnext_tiny', pretrained=True, num_classes=10)
6
7# Swin Transformer (2021)
8swin = timm.create_model('swin_tiny_patch4_window7_224', pretrained=True, num_classes=10)
9
10# MaxViT (2022)
11maxvit = timm.create_model('maxvit_tiny_tf_224', pretrained=True, num_classes=10)
12
13# Compare parameter counts
14for name, model in [('ConvNeXt-T', convnext), ('Swin-T', swin), ('MaxViT-T', maxvit)]:
15    params = sum(p.numel() for p in model.parameters())
16    print(f"{name}: {params/1e6:.1f}M params")

Checkpoint

Bạn đã biết về timm library?

Task 6

🎯 Complete Vision Transfer Pipeline

TB5 min

Full Pipeline

python.py

1import tensorflow as tf
2from tensorflow.keras.applications import EfficientNetB0
3from tensorflow.keras.applications.efficientnet import preprocess_input
4from tensorflow.keras import layers, Model
5from tensorflow.keras.callbacks import (
6    EarlyStopping, 
7    ModelCheckpoint, 
8    ReduceLROnPlateau
9)
10
11def create_transfer_model(num_classes, fine_tune_layers=0):
12    """
13    Create transfer learning model
14    
15    Args:
16        num_classes: Number of output classes
17        fine_tune_layers: Number of layers to unfreeze from top
18                         (0 = feature extraction only)
19    """
20    # Base model
21    base = EfficientNetB0(
22        weights='imagenet',
23        include_top=False,
24        input_shape=(224, 224, 3)
25    )
26    
27    # Freeze/unfreeze
28    if fine_tune_layers == 0:
29        base.trainable = False
30    else:
31        base.trainable = True
32        for layer in base.layers[:-fine_tune_layers]:
33            layer.trainable = False
34    
35    # Data augmentation
36    augmentation = tf.keras.Sequential([
37        layers.RandomFlip("horizontal"),
38        layers.RandomRotation(0.2),
39        layers.RandomZoom(0.2),
40        layers.RandomContrast(0.2),
41    ], name='augmentation')
42    
43    # Build model
44    inputs = layers.Input(shape=(224, 224, 3))
45    x = augmentation(inputs)
46    x = preprocess_input(x)
47    x = base(x, training=False)
48    x = layers.GlobalAveragePooling2D()(x)
49    x = layers.Dense(256, activation='relu')(x)
50    x = layers.Dropout(0.5)(x)
51    outputs = layers.Dense(num_classes, activation='softmax')(x)
52    
53    return Model(inputs, outputs)
54
55
56def train_transfer_model(train_data, val_data, num_classes):
57    """Two-stage training: feature extraction then fine-tuning"""
58    
59    # Stage 1: Feature Extraction
60    print("=" * 50)
61    print("Stage 1: Feature Extraction")
62    print("=" * 50)
63    
64    model = create_transfer_model(num_classes, fine_tune_layers=0)
65    
66    model.compile(
67        optimizer=tf.keras.optimizers.Adam(1e-3),
68        loss='categorical_crossentropy',
69        metrics=['accuracy']
70    )
71    
72    callbacks = [
73        EarlyStopping(patience=5, restore_best_weights=True),
74        ReduceLROnPlateau(factor=0.2, patience=3)
75    ]
76    
77    model.fit(
78        train_data,
79        validation_data=val_data,
80        epochs=20,
81        callbacks=callbacks
82    )
83    
84    # Stage 2: Fine-tuning
85    print("\n" + "=" * 50)
86    print("Stage 2: Fine-tuning")
87    print("=" * 50)
88    
89    model = create_transfer_model(num_classes, fine_tune_layers=50)
90    
91    model.compile(
92        optimizer=tf.keras.optimizers.Adam(1e-5),
93        loss='categorical_crossentropy',
94        metrics=['accuracy']
95    )
96    
97    callbacks = [
98        EarlyStopping(patience=5, restore_best_weights=True),
99        ModelCheckpoint('best_model.keras', save_best_only=True),
100        ReduceLROnPlateau(factor=0.2, patience=3)
101    ]
102    
103    model.fit(
104        train_data,
105        validation_data=val_data,
106        epochs=20,
107        callbacks=callbacks
108    )
109    
110    return model
111
112
113# Usage example:
114# model = train_transfer_model(train_ds, val_ds, num_classes=10)

Checkpoint

Bạn có thể xây dựng complete vision transfer pipeline?

Task 7

🎯 Tổng kết Module Transfer Learning

TB5 min

Key Takeaways

Aspect	NLP	Vision
Library	Hugging Face	Keras Apps / timm
Models	BERT, GPT, T5	ResNet, ViT, EfficientNet
Preprocessing	Tokenization	Resize + Normalize
Input	Token IDs	Image tensors

Transfer Learning Decision Tree

Model Selection

NLP:

Speed: DistilBERT, MiniLM
Accuracy: RoBERTa, DeBERTa
Generation: GPT-2, T5

Vision:

Speed: MobileNet, EfficientNet-B0
Accuracy: EfficientNet-B7, ConvNeXt
Transformer: ViT, Swin

Next Steps

Module tiếp theo: Optimization & Deployment

Techniques tối ưu training
Model compression
Deployment strategies

🎉 Hoàn thành Module Transfer Learning! Bạn đã master cả NLP và Vision transfer learning.

Task 8

Vision Transfer Learning & Model Hub

🎯 Mục tiêu bài học

Sau bài này, bạn sẽ:

Ôn lại bài trước

🖼️ Vision Transfer Learning Overview

Pretrained Vision Models

Popular Pretrained Models

Checkpoint

🔧 Keras Applications

Load Pretrained Model

Checkpoint

🏗️ Build Custom Classifier

Feature Extraction Approach

Compile and Train

Checkpoint

🔓 Fine-tuning Vision Models

Unfreeze Strategy

Which layers to unfreeze?

Checkpoint

🤗 Hugging Face Vision

Vision Transformer (ViT)

Fine-tune ViT

Checkpoint

📚 Model Hub & timm

timm Library

Popular timm Models

Checkpoint

🎯 Complete Vision Transfer Pipeline

Full Pipeline

Checkpoint

🎯 Tổng kết Module Transfer Learning

Key Takeaways

Transfer Learning Decision Tree

Model Selection

Next Steps

Khóa học

Mentor & Hỗ trợ

Blog

Giới thiệu