MinAI - Về trang chủ
Lý thuyết
18/2155 phút
Đang tải...

Vision Transfer Learning & Model Hub

Transfer Learning cho Computer Vision với ImageNet pretrained models và Hugging Face Vision

0

🎯 Mục tiêu bài học

TB5 min

Sau bài này, bạn sẽ:

✅ Dùng ImageNet pretrained models (ResNet, EfficientNet)

✅ Sử dụng Keras ApplicationsHugging Face

✅ Biết cách chọn model phù hợp

✅ Xây dựng Vision pipeline hoàn chỉnh

Ôn lại bài trước

Bài 17 học Transfer Learning cho NLP. Hôm nay áp dụng cho Computer Vision!

1

🖼️ Vision Transfer Learning Overview

TB5 min

Pretrained Vision Models

ImageNet pretrained models đã học:

  • Edges, textures (low-level)
  • Shapes, patterns (mid-level)
  • Object parts, objects (high-level)

→ Transfer tốt cho hầu hết vision tasks!

Popular Pretrained Models

ModelYearTop-1 AccParametersSpeed
ResNet50201576.1%25MFast
VGG16201471.3%138MSlow
InceptionV3201577.9%24MMedium
EfficientNetB0201977.1%5MFast
ViT-Base202081.8%86MMedium

Checkpoint

Bạn đã biết các pretrained vision models phổ biến?

2

🔧 Keras Applications

TB5 min

Load Pretrained Model

python.py
1import tensorflow as tf
2from tensorflow.keras.applications import (
3 ResNet50,
4 VGG16,
5 InceptionV3,
6 EfficientNetB0,
7 MobileNetV2
8)
9from tensorflow.keras import layers, Model
10
11# Load ResNet50 without top (classifier)
12base_model = ResNet50(
13 weights='imagenet', # Pretrained weights
14 include_top=False, # Remove classifier
15 input_shape=(224, 224, 3)
16)
17
18print(f"ResNet50 layers: {len(base_model.layers)}")
19print(f"Output shape: {base_model.output_shape}")
20
21# Other models
22models = {
23 'vgg16': VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3)),
24 'inception': InceptionV3(weights='imagenet', include_top=False, input_shape=(299, 299, 3)),
25 'efficientnet': EfficientNetB0(weights='imagenet', include_top=False, input_shape=(224, 224, 3)),
26 'mobilenet': MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3)),
27}
28
29for name, model in models.items():
30 print(f"{name}: {len(model.layers)} layers, {model.count_params():,} params")
Expected Output
1ResNet50 layers: 175
2Output shape: (None, 7, 7, 2048)
3vgg16: 19 layers, 14,714,688 params
4inception: 311 layers, 21,802,784 params
5efficientnet: 237 layers, 4,049,571 params
6mobilenet: 155 layers, 2,257,984 params

Checkpoint

Bạn đã biết cách load pretrained models?

3

🏗️ Build Custom Classifier

TB5 min

Feature Extraction Approach

python.py
1from tensorflow.keras.applications import ResNet50
2from tensorflow.keras import layers, Model
3
4# Load base model
5base_model = ResNet50(
6 weights='imagenet',
7 include_top=False,
8 input_shape=(224, 224, 3)
9)
10
11# Freeze base model
12base_model.trainable = False
13
14# Build classifier
15inputs = layers.Input(shape=(224, 224, 3))
16x = base_model(inputs, training=False) # training=False for BatchNorm
17
18# Global Average Pooling
19x = layers.GlobalAveragePooling2D()(x)
20
21# Custom classifier
22x = layers.Dense(256, activation='relu')(x)
23x = layers.Dropout(0.5)(x)
24outputs = layers.Dense(10, activation='softmax')(x) # 10 classes
25
26model = Model(inputs, outputs)
27
28model.summary()
29
30# Check trainable params
31trainable_params = sum([tf.size(w).numpy() for w in model.trainable_weights])
32total_params = model.count_params()
33print(f"\nTrainable: {trainable_params:,} / {total_params:,} ({100*trainable_params/total_params:.1f}%)")
Expected Output
1Model: "model"
2_________________________________________________________________
3 Layer (type) Output Shape Param #
4=================================================================
5 input_1 (InputLayer) [(None, 224, 224, 3)] 0
6 resnet50 (Functional) (None, 7, 7, 2048) 23,587,712
7 global_average_pooling2d (None, 2048) 0
8 dense (Dense) (None, 256) 524,544
9 dropout (Dropout) (None, 256) 0
10 dense_1 (Dense) (None, 10) 2,570
11=================================================================
12Total params: 24,114,826
13Trainable: 527,114 / 24,114,826 (2.2%)

Compile and Train

python.py
1# Compile
2model.compile(
3 optimizer='adam',
4 loss='categorical_crossentropy',
5 metrics=['accuracy']
6)
7
8# Data augmentation
9data_augmentation = tf.keras.Sequential([
10 layers.RandomFlip("horizontal"),
11 layers.RandomRotation(0.1),
12 layers.RandomZoom(0.1),
13])
14
15# Preprocess function for ResNet
16from tensorflow.keras.applications.resnet50 import preprocess_input
17
18def preprocess(images, labels):
19 # Data augmentation
20 images = data_augmentation(images)
21 # ResNet preprocessing
22 images = preprocess_input(images)
23 return images, labels
24
25# Train
26# model.fit(train_data.map(preprocess), epochs=10, validation_data=val_data)

Checkpoint

Bạn đã xây dựng được classifier với Feature Extraction?

4

🔓 Fine-tuning Vision Models

TB5 min

Unfreeze Strategy

python.py
1from tensorflow.keras.applications import ResNet50
2from tensorflow.keras import layers, Model
3import tensorflow as tf
4
5def build_model_for_finetuning(num_classes=10):
6 """Build model with frozen base, ready for fine-tuning"""
7
8 base_model = ResNet50(
9 weights='imagenet',
10 include_top=False,
11 input_shape=(224, 224, 3)
12 )
13
14 # Initially freeze all
15 base_model.trainable = False
16
17 # Build model
18 inputs = layers.Input(shape=(224, 224, 3))
19 x = base_model(inputs, training=False)
20 x = layers.GlobalAveragePooling2D()(x)
21 x = layers.Dense(256, activation='relu')(x)
22 x = layers.Dropout(0.5)(x)
23 outputs = layers.Dense(num_classes, activation='softmax')(x)
24
25 return Model(inputs, outputs), base_model
26
27
28# Stage 1: Train classifier only
29model, base_model = build_model_for_finetuning(num_classes=10)
30
31model.compile(
32 optimizer=tf.keras.optimizers.Adam(1e-3),
33 loss='categorical_crossentropy',
34 metrics=['accuracy']
35)
36
37print("Stage 1: Training classifier (base frozen)")
38# model.fit(train_data, epochs=5, validation_data=val_data)
39
40
41# Stage 2: Fine-tune top layers
42base_model.trainable = True
43
44# Freeze first 100 layers (keep early features frozen)
45for layer in base_model.layers[:100]:
46 layer.trainable = False
47
48# Lower learning rate for fine-tuning
49model.compile(
50 optimizer=tf.keras.optimizers.Adam(1e-5), # 100x smaller
51 loss='categorical_crossentropy',
52 metrics=['accuracy']
53)
54
55# Count trainable
56trainable = sum([tf.size(w).numpy() for w in model.trainable_weights])
57print(f"\nStage 2: Fine-tuning (trainable params: {trainable:,})")
58# model.fit(train_data, epochs=5, validation_data=val_data)

Which layers to unfreeze?

Unfreezing guide:

  • Similar domain (photos → photos): Unfreeze top layers only
  • Different domain (photos → medical): Unfreeze more layers
  • Small data: Keep more frozen
  • Large data: Unfreeze more
python.py
1# Gradual unfreezing schedule
2def gradual_unfreeze(base_model, stage):
3 """
4 Stage 0: All frozen
5 Stage 1: Last block unfrozen
6 Stage 2: Last 2 blocks
7 Stage 3: All unfrozen
8 """
9 base_model.trainable = True
10
11 # Find layer names for ResNet blocks
12 block_prefixes = ['conv5', 'conv4', 'conv3', 'conv2']
13
14 if stage == 0:
15 base_model.trainable = False
16 else:
17 for layer in base_model.layers:
18 # Freeze all by default
19 layer.trainable = False
20
21 # Unfreeze based on stage
22 for i, prefix in enumerate(block_prefixes[:stage]):
23 if prefix in layer.name:
24 layer.trainable = True
25 break
26
27# Example usage
28for stage in range(4):
29 gradual_unfreeze(base_model, stage)
30 trainable_count = sum([1 for l in base_model.layers if l.trainable])
31 print(f"Stage {stage}: {trainable_count} trainable layers")

Checkpoint

Bạn đã hiểu cách fine-tune vision models?

5

🤗 Hugging Face Vision

TB5 min

Vision Transformer (ViT)

python.py
1from transformers import (
2 ViTFeatureExtractor,
3 ViTForImageClassification,
4 TFViTForImageClassification
5)
6from PIL import Image
7import requests
8
9# Load pretrained ViT
10model_name = "google/vit-base-patch16-224"
11
12feature_extractor = ViTFeatureExtractor.from_pretrained(model_name)
13model = TFViTForImageClassification.from_pretrained(model_name)
14
15# Load image
16url = "http://images.cocodataset.org/val2017/000000039769.jpg"
17image = Image.open(requests.get(url, stream=True).raw)
18
19# Preprocess
20inputs = feature_extractor(images=image, return_tensors="tf")
21
22# Predict
23outputs = model(**inputs)
24logits = outputs.logits
25
26# Get predicted class
27predicted_class = logits.numpy().argmax(-1)[0]
28print(f"Predicted class: {model.config.id2label[predicted_class]}")

Fine-tune ViT

python.py
1from transformers import (
2 TFViTForImageClassification,
3 ViTFeatureExtractor,
4 TrainingArguments
5)
6from datasets import load_dataset
7import tensorflow as tf
8
9# Load dataset
10dataset = load_dataset("beans") # Small dataset for demo
11
12# Feature extractor
13feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
14
15# Preprocess function
16def preprocess(examples):
17 images = [img.convert("RGB") for img in examples["image"]]
18 inputs = feature_extractor(images=images, return_tensors="np")
19 inputs["labels"] = examples["labels"]
20 return inputs
21
22# Apply preprocessing
23processed_dataset = dataset.with_transform(preprocess)
24
25# Load model for our number of classes
26model = TFViTForImageClassification.from_pretrained(
27 "google/vit-base-patch16-224",
28 num_labels=3, # beans has 3 classes
29 ignore_mismatched_sizes=True
30)
31
32# Training with TensorFlow
33train_ds = processed_dataset["train"].to_tf_dataset(
34 columns=["pixel_values"],
35 label_cols=["labels"],
36 shuffle=True,
37 batch_size=8
38)
39
40model.compile(
41 optimizer=tf.keras.optimizers.Adam(2e-5),
42 loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
43 metrics=['accuracy']
44)
45
46# model.fit(train_ds, epochs=3)

Checkpoint

Bạn đã biết cách dùng Hugging Face cho Vision?

6

📚 Model Hub & timm

TB5 min

timm Library

timm (PyTorch Image Models) là thư viện với 800+ pretrained vision models.

  • Hầu hết SOTA models
  • Consistent API
  • Easy to use
python.py
1# pip install timm
2
3import timm
4
5# List available models
6print(f"Total models: {len(timm.list_models())}")
7
8# Search models
9resnet_models = timm.list_models('resnet*')
10print(f"ResNet variants: {len(resnet_models)}")
11
12efficientnet_models = timm.list_models('efficientnet*')
13print(f"EfficientNet variants: {len(efficientnet_models)}")
14
15# Load model
16model = timm.create_model(
17 'efficientnet_b0',
18 pretrained=True,
19 num_classes=10 # Custom number of classes
20)
21
22# Model info
23data_config = timm.data.resolve_model_data_config(model)
24print(f"Input size: {data_config['input_size']}")
25print(f"Mean: {data_config['mean']}")
26print(f"Std: {data_config['std']}")
27
28# Get feature extractor (without classifier)
29feature_extractor = timm.create_model(
30 'efficientnet_b0',
31 pretrained=True,
32 num_classes=0 # Remove classifier
33)
34print(f"Feature dim: {feature_extractor.num_features}")

Popular timm Models

Model FamilyBest VariantTop-1 AccUse Case
EfficientNetefficientnet_b784.4%Balance speed/accuracy
ConvNeXtconvnext_large87.5%Best CNN
ViTvit_large_patch1687.8%Transformer
Swinswin_large_patch487.3%Efficient Transformer
RegNetregnetx_32079.9%Fast inference
python.py
1# Modern architectures
2import timm
3
4# ConvNeXt - Modern CNN (2022)
5convnext = timm.create_model('convnext_tiny', pretrained=True, num_classes=10)
6
7# Swin Transformer (2021)
8swin = timm.create_model('swin_tiny_patch4_window7_224', pretrained=True, num_classes=10)
9
10# MaxViT (2022)
11maxvit = timm.create_model('maxvit_tiny_tf_224', pretrained=True, num_classes=10)
12
13# Compare parameter counts
14for name, model in [('ConvNeXt-T', convnext), ('Swin-T', swin), ('MaxViT-T', maxvit)]:
15 params = sum(p.numel() for p in model.parameters())
16 print(f"{name}: {params/1e6:.1f}M params")

Checkpoint

Bạn đã biết về timm library?

7

🎯 Complete Vision Transfer Pipeline

TB5 min

Full Pipeline

python.py
1import tensorflow as tf
2from tensorflow.keras.applications import EfficientNetB0
3from tensorflow.keras.applications.efficientnet import preprocess_input
4from tensorflow.keras import layers, Model
5from tensorflow.keras.callbacks import (
6 EarlyStopping,
7 ModelCheckpoint,
8 ReduceLROnPlateau
9)
10
11def create_transfer_model(num_classes, fine_tune_layers=0):
12 """
13 Create transfer learning model
14
15 Args:
16 num_classes: Number of output classes
17 fine_tune_layers: Number of layers to unfreeze from top
18 (0 = feature extraction only)
19 """
20 # Base model
21 base = EfficientNetB0(
22 weights='imagenet',
23 include_top=False,
24 input_shape=(224, 224, 3)
25 )
26
27 # Freeze/unfreeze
28 if fine_tune_layers == 0:
29 base.trainable = False
30 else:
31 base.trainable = True
32 for layer in base.layers[:-fine_tune_layers]:
33 layer.trainable = False
34
35 # Data augmentation
36 augmentation = tf.keras.Sequential([
37 layers.RandomFlip("horizontal"),
38 layers.RandomRotation(0.2),
39 layers.RandomZoom(0.2),
40 layers.RandomContrast(0.2),
41 ], name='augmentation')
42
43 # Build model
44 inputs = layers.Input(shape=(224, 224, 3))
45 x = augmentation(inputs)
46 x = preprocess_input(x)
47 x = base(x, training=False)
48 x = layers.GlobalAveragePooling2D()(x)
49 x = layers.Dense(256, activation='relu')(x)
50 x = layers.Dropout(0.5)(x)
51 outputs = layers.Dense(num_classes, activation='softmax')(x)
52
53 return Model(inputs, outputs)
54
55
56def train_transfer_model(train_data, val_data, num_classes):
57 """Two-stage training: feature extraction then fine-tuning"""
58
59 # Stage 1: Feature Extraction
60 print("=" * 50)
61 print("Stage 1: Feature Extraction")
62 print("=" * 50)
63
64 model = create_transfer_model(num_classes, fine_tune_layers=0)
65
66 model.compile(
67 optimizer=tf.keras.optimizers.Adam(1e-3),
68 loss='categorical_crossentropy',
69 metrics=['accuracy']
70 )
71
72 callbacks = [
73 EarlyStopping(patience=5, restore_best_weights=True),
74 ReduceLROnPlateau(factor=0.2, patience=3)
75 ]
76
77 model.fit(
78 train_data,
79 validation_data=val_data,
80 epochs=20,
81 callbacks=callbacks
82 )
83
84 # Stage 2: Fine-tuning
85 print("\n" + "=" * 50)
86 print("Stage 2: Fine-tuning")
87 print("=" * 50)
88
89 model = create_transfer_model(num_classes, fine_tune_layers=50)
90
91 model.compile(
92 optimizer=tf.keras.optimizers.Adam(1e-5),
93 loss='categorical_crossentropy',
94 metrics=['accuracy']
95 )
96
97 callbacks = [
98 EarlyStopping(patience=5, restore_best_weights=True),
99 ModelCheckpoint('best_model.keras', save_best_only=True),
100 ReduceLROnPlateau(factor=0.2, patience=3)
101 ]
102
103 model.fit(
104 train_data,
105 validation_data=val_data,
106 epochs=20,
107 callbacks=callbacks
108 )
109
110 return model
111
112
113# Usage example:
114# model = train_transfer_model(train_ds, val_ds, num_classes=10)

Checkpoint

Bạn có thể xây dựng complete vision transfer pipeline?

8

🎯 Tổng kết Module Transfer Learning

TB5 min

Key Takeaways

AspectNLPVision
LibraryHugging FaceKeras Apps / timm
ModelsBERT, GPT, T5ResNet, ViT, EfficientNet
PreprocessingTokenizationResize + Normalize
InputToken IDsImage tensors

Transfer Learning Decision Tree

Transfer Learning Decision Tree📊 Có bao nhiêu data?< 100 samples🧊Feature Extraction(freeze all layers)100 – 1,000🔓Unfreeze top layers(fine-tune một phần)1,000 – 10,000🔧Fine-tune most layers(giữ lại vài layers đầu)> 10,000🚀Train from scratch(hoặc fine-tune toàn bộ)

Model Selection

NLP:

  • Speed: DistilBERT, MiniLM
  • Accuracy: RoBERTa, DeBERTa
  • Generation: GPT-2, T5

Vision:

  • Speed: MobileNet, EfficientNet-B0
  • Accuracy: EfficientNet-B7, ConvNeXt
  • Transformer: ViT, Swin

Next Steps

Module tiếp theo: Optimization & Deployment

  • Techniques tối ưu training
  • Model compression
  • Deployment strategies

🎉 Hoàn thành Module Transfer Learning! Bạn đã master cả NLP và Vision transfer learning.