Các Mô Hình CNN Nổi Tiếng

🎯 Mục tiêu bài học

TB5 min

Sau bài này, bạn sẽ:

✅ Biết lịch sử phát triển CNN

✅ Hiểu các kiến trúc: LeNet, AlexNet, VGG, ResNet

✅ Biết điểm đột phá của từng model

✅ Chọn model phù hợp cho bài toán thực tế

Ôn lại bài trước

Đã học Convolution và Pooling. Hôm nay học cách kết hợp chúng thành các kiến trúc nổi tiếng!

Task 0

🏆 Lịch sử phát triển CNN

TB5 min

ImageNet Large Scale Visual Recognition Challenge (ILSVRC)

Năm	Model	Top-5 Error	Đột phá
2010	Traditional	28.2%	SIFT + SVM
2012	AlexNet	16.4%	Deep CNN + GPU
2013	ZFNet	14.8%	Visualization
2014	VGGNet	7.3%	Small filters (3×3)
2014	GoogLeNet	6.7%	Inception module
2015	ResNet	3.6%	Skip connections
2017	SENet	2.3%	Squeeze-Excitation

2012 là năm bản lề! AlexNet giảm error từ 26% xuống 16%, đánh dấu sự thống trị của Deep Learning trong Computer Vision.

Checkpoint

Bạn đã biết lịch sử phát triển CNN?

Task 1

📜 LeNet-5 (1998)

TB5 min

Kiến trúc LeNet-5

LeNet-5 - Yann LeCun, mô hình CNN đầu tiên thành công:

Triển khai Keras

python.py

1from tensorflow import keras
2from tensorflow.keras import layers
3
4def create_lenet5():
5    """LeNet-5 - Yann LeCun 1998"""
6    model = keras.Sequential([
7        # Input: 32×32×1 (grayscale)
8        
9        # C1: Convolutional layer
10        layers.Conv2D(6, (5, 5), activation='tanh',
11                      input_shape=(32, 32, 1)),
12        # Output: 28×28×6
13        
14        # S2: Subsampling (Average Pooling)
15        layers.AveragePooling2D((2, 2)),
16        # Output: 14×14×6
17        
18        # C3: Convolutional layer
19        layers.Conv2D(16, (5, 5), activation='tanh'),
20        # Output: 10×10×16
21        
22        # S4: Subsampling
23        layers.AveragePooling2D((2, 2)),
24        # Output: 5×5×16 = 400
25        
26        # Flatten
27        layers.Flatten(),
28        
29        # C5: Fully connected (originally convolution)
30        layers.Dense(120, activation='tanh'),
31        
32        # F6: Fully connected
33        layers.Dense(84, activation='tanh'),
34        
35        # Output layer
36        layers.Dense(10, activation='softmax')
37    ])
38    
39    return model
40
41model = create_lenet5()
42print(f"Total params: {model.count_params():,}")

Đặc điểm LeNet-5

Đặc điểm	Chi tiết
Activation	Tanh (chưa có ReLU)
Pooling	Average Pooling
Application	Nhận dạng chữ số (MNIST)
Parameters	~60K
Ý nghĩa	Chứng minh CNN hoạt động

Checkpoint

Bạn đã hiểu kiến trúc LeNet-5?

Task 2

🚀 AlexNet (2012)

TB5 min

Kiến trúc AlexNet

AlexNet - Alex Krizhevsky, thắng ILSVRC 2012:

Triển khai Keras

python.py

1def create_alexnet():
2    """AlexNet - Krizhevsky 2012"""
3    model = keras.Sequential([
4        # Input: 227×227×3
5        
6        # Conv1
7        layers.Conv2D(96, (11, 11), strides=(4, 4), 
8                      activation='relu',
9                      input_shape=(227, 227, 3)),
10        layers.BatchNormalization(),  # Original: LRN
11        layers.MaxPooling2D((3, 3), strides=(2, 2)),
12        
13        # Conv2
14        layers.Conv2D(256, (5, 5), padding='same', 
15                      activation='relu'),
16        layers.BatchNormalization(),
17        layers.MaxPooling2D((3, 3), strides=(2, 2)),
18        
19        # Conv3, 4, 5
20        layers.Conv2D(384, (3, 3), padding='same', 
21                      activation='relu'),
22        layers.Conv2D(384, (3, 3), padding='same', 
23                      activation='relu'),
24        layers.Conv2D(256, (3, 3), padding='same', 
25                      activation='relu'),
26        layers.MaxPooling2D((3, 3), strides=(2, 2)),
27        
28        # Fully connected
29        layers.Flatten(),
30        layers.Dense(4096, activation='relu'),
31        layers.Dropout(0.5),
32        layers.Dense(4096, activation='relu'),
33        layers.Dropout(0.5),
34        layers.Dense(1000, activation='softmax')
35    ])
36    
37    return model
38
39model = create_alexnet()
40print(f"Total params: {model.count_params():,}")
41# ~60 million parameters!

Đóng góp của AlexNet

Innovation	Chi tiết
ReLU	Đầu tiên dùng ReLU, train nhanh 6×
Dropout	Regularization mới
GPU Training	2 GTX 580 (3GB mỗi card)
Data Augmentation	Crop, flip, color jitter
Overlapping Pooling	Pool 3×3 với stride 2

AlexNet là "Big Bang" của Deep Learning - chứng minh deep neural networks có thể giải quyết vấn đề thực tế với quy mô lớn.

Checkpoint

Bạn đã hiểu tầm quan trọng của AlexNet?

Task 3

🏛️ VGGNet (2014)

TB5 min

Triết lý VGGNet

VGGNet (Oxford Visual Geometry Group): "Sâu hơn với kernels nhỏ"

Thay vì 1 Conv 7×7, dùng 3 Conv 3×3:

Cùng receptive field
Nhiều non-linearity hơn
Ít parameters hơn: 3×(3×3) = 27 < 7×7 = 49

VGG16 Architecture

Triển khai Keras

python.py

1def create_vgg16():
2    """VGG16 - Simonyan & Zisserman 2014"""
3    model = keras.Sequential([
4        # Input: 224×224×3
5        
6        # Block 1
7        layers.Conv2D(64, (3, 3), padding='same', 
8                      activation='relu',
9                      input_shape=(224, 224, 3)),
10        layers.Conv2D(64, (3, 3), padding='same', 
11                      activation='relu'),
12        layers.MaxPooling2D((2, 2)),
13        
14        # Block 2
15        layers.Conv2D(128, (3, 3), padding='same', 
16                      activation='relu'),
17        layers.Conv2D(128, (3, 3), padding='same', 
18                      activation='relu'),
19        layers.MaxPooling2D((2, 2)),
20        
21        # Block 3
22        layers.Conv2D(256, (3, 3), padding='same', 
23                      activation='relu'),
24        layers.Conv2D(256, (3, 3), padding='same', 
25                      activation='relu'),
26        layers.Conv2D(256, (3, 3), padding='same', 
27                      activation='relu'),
28        layers.MaxPooling2D((2, 2)),
29        
30        # Block 4
31        layers.Conv2D(512, (3, 3), padding='same', 
32                      activation='relu'),
33        layers.Conv2D(512, (3, 3), padding='same', 
34                      activation='relu'),
35        layers.Conv2D(512, (3, 3), padding='same', 
36                      activation='relu'),
37        layers.MaxPooling2D((2, 2)),
38        
39        # Block 5
40        layers.Conv2D(512, (3, 3), padding='same', 
41                      activation='relu'),
42        layers.Conv2D(512, (3, 3), padding='same', 
43                      activation='relu'),
44        layers.Conv2D(512, (3, 3), padding='same', 
45                      activation='relu'),
46        layers.MaxPooling2D((2, 2)),
47        
48        # Classifier
49        layers.Flatten(),
50        layers.Dense(4096, activation='relu'),
51        layers.Dropout(0.5),
52        layers.Dense(4096, activation='relu'),
53        layers.Dropout(0.5),
54        layers.Dense(1000, activation='softmax')
55    ])
56    
57    return model
58
59# Hoặc dùng pretrained
60from tensorflow.keras.applications import VGG16
61vgg = VGG16(weights='imagenet', include_top=True)

Đặc điểm VGG

Aspect	Chi tiết
Parameters	138 million (rất nặng)
Kernel size	Chỉ dùng 3×3
Ưu điểm	Đơn giản, dễ hiểu
Nhược điểm	Quá nhiều params ở FC
Legacy	Rất phổ biến cho Transfer Learning

Checkpoint

Bạn đã hiểu triết lý của VGGNet?

Task 4

⚡ ResNet và Skip Connections (2015)

TB5 min

Vấn đề Degradation

Mạng sâu hơn ≠ Tốt hơn?

Training một mạng 56 layers có error cao hơn mạng 20 layers!

Không phải overfitting (train error cũng cao)
Là vấn đề optimization - gradient khó truyền qua nhiều layers

Skip Connection - Giải pháp

Regular Block vs Residual Block

ResNet Architecture

Triển khai Residual Block

python.py

1from tensorflow.keras import layers, Model
2
3class ResidualBlock(layers.Layer):
4    """Basic Residual Block"""
5    
6    def __init__(self, filters, strides=1):
7        super().__init__()
8        self.conv1 = layers.Conv2D(filters, (3, 3), 
9                                   strides=strides, 
10                                   padding='same')
11        self.bn1 = layers.BatchNormalization()
12        self.conv2 = layers.Conv2D(filters, (3, 3), 
13                                   padding='same')
14        self.bn2 = layers.BatchNormalization()
15        
16        # Shortcut connection
17        if strides != 1:
18            self.shortcut = keras.Sequential([
19                layers.Conv2D(filters, (1, 1), strides=strides),
20                layers.BatchNormalization()
21            ])
22        else:
23            self.shortcut = lambda x: x
24    
25    def call(self, x, training=False):
26        # Main path
27        out = self.conv1(x)
28        out = self.bn1(out, training=training)
29        out = layers.ReLU()(out)
30        out = self.conv2(out)
31        out = self.bn2(out, training=training)
32        
33        # Skip connection
34        shortcut = self.shortcut(x)
35        
36        # Add and activate
37        out = layers.Add()([out, shortcut])
38        out = layers.ReLU()(out)
39        
40        return out

Sử dụng pretrained ResNet

python.py

1from tensorflow.keras.applications import ResNet50
2
3# Load pretrained ResNet50
4resnet = ResNet50(
5    weights='imagenet',    # Pretrained on ImageNet
6    include_top=True,      # Include classification head
7    input_shape=(224, 224, 3)
8)
9
10# Dùng cho feature extraction
11resnet_features = ResNet50(
12    weights='imagenet',
13    include_top=False,     # Remove top, get features
14    input_shape=(224, 224, 3)
15)
16# Output: 7×7×2048

Ưu điểm của ResNet

Ưu điểm	Giải thích
Train rất sâu	152 layers, thậm chí 1000+ layers
Gradient flow tốt	Skip connection giúp gradient chảy trực tiếp
No degradation	Mạng sâu hơn = Tốt hơn
Efficient	Ít params hơn VGG (25M vs 138M)

ResNet là breakthrough quan trọng nhất sau AlexNet! Cho phép train mạng rất sâu mà không bị degradation.

Checkpoint

Bạn đã hiểu Skip Connections và ResNet?

Task 5

🔀 Inception/GoogLeNet (2014)

TB5 min

Ý tưởng Inception Module

Inception: "Tại sao phải chọn kernel size?" → Dùng TẤT CẢ các kernel sizes song song!

Inception Module (với 1×1 bottleneck)

python.py

1def inception_module(x, filters):
2    """Inception module with dimensionality reduction"""
3    f1, f3_reduce, f3, f5_reduce, f5, pool_proj = filters
4    
5    # Branch 1: 1×1 conv
6    branch1 = layers.Conv2D(f1, (1, 1), 
7                            padding='same', 
8                            activation='relu')(x)
9    
10    # Branch 2: 1×1 → 3×3
11    branch2 = layers.Conv2D(f3_reduce, (1, 1), 
12                            padding='same', 
13                            activation='relu')(x)
14    branch2 = layers.Conv2D(f3, (3, 3), 
15                            padding='same', 
16                            activation='relu')(branch2)
17    
18    # Branch 3: 1×1 → 5×5
19    branch3 = layers.Conv2D(f5_reduce, (1, 1), 
20                            padding='same', 
21                            activation='relu')(x)
22    branch3 = layers.Conv2D(f5, (5, 5), 
23                            padding='same', 
24                            activation='relu')(branch3)
25    
26    # Branch 4: MaxPool → 1×1
27    branch4 = layers.MaxPooling2D((3, 3), strides=(1, 1),
28                                   padding='same')(x)
29    branch4 = layers.Conv2D(pool_proj, (1, 1), 
30                            padding='same', 
31                            activation='relu')(branch4)
32    
33    # Concatenate all branches
34    output = layers.Concatenate()([branch1, branch2, 
35                                   branch3, branch4])
36    return output

Sử dụng pretrained Inception

python.py

1from tensorflow.keras.applications import InceptionV3
2
3# InceptionV3 (improved version)
4inception = InceptionV3(
5    weights='imagenet',
6    include_top=False,
7    input_shape=(299, 299, 3)  # Note: 299, not 224!
8)

Checkpoint

Bạn đã hiểu Inception module?

Task 6

📊 So sánh các CNN Models

TB5 min

Bảng so sánh

Model	Year	Params	Top-5 Error	Depth	Key Innovation
LeNet	1998	60K	-	5	CNN đầu tiên
AlexNet	2012	60M	16.4%	8	ReLU, Dropout, GPU
VGG16	2014	138M	7.3%	16	3×3 kernels only
GoogLeNet	2014	6.8M	6.7%	22	Inception module
ResNet50	2015	25M	3.6%	50	Skip connections
ResNet152	2015	60M	3.6%	152	Very deep

Khi nào dùng model nào?

Scenario	Recommended Model	Lý do
Resource limited	MobileNet, EfficientNet-B0	Nhẹ, nhanh
High accuracy	EfficientNet-B7, ResNeXt	State-of-the-art
Transfer learning	ResNet50, VGG16	Pretrained weights tốt
Real-time	MobileNetV2, YOLO	Tối ưu speed
Learning	VGG16	Đơn giản, dễ hiểu

Modern Architectures (2020+)

python.py

1from tensorflow.keras.applications import (
2    EfficientNetB0,
3    EfficientNetB7,
4    MobileNetV3Large,
5)
6
7# EfficientNet - Best accuracy/efficiency trade-off
8efficientnet = EfficientNetB0(
9    weights='imagenet',
10    include_top=False,
11    input_shape=(224, 224, 3)
12)
13
14# MobileNetV3 - Mobile devices
15mobilenet = MobileNetV3Large(
16    weights='imagenet',
17    include_top=False,
18    input_shape=(224, 224, 3)
19)

Checkpoint

Bạn đã biết chọn model nào cho task của mình?

Task 7

🎯 Tổng kết

TB5 min

Timeline quan trọng

Key Takeaways

LeNet: CNN concept works
AlexNet: Deep + GPU + ReLU + Dropout
VGG: Simple 3×3 kernels, go deeper
Inception: Multi-scale features in parallel
ResNet: Skip connections enable very deep networks

Code để sử dụng pretrained models

python.py

1from tensorflow.keras.applications import (
2    VGG16, ResNet50, InceptionV3, EfficientNetB0
3)
4
5# Danh sách models phổ biến
6models_info = {
7    'VGG16': (VGG16, (224, 224, 3)),
8    'ResNet50': (ResNet50, (224, 224, 3)),
9    'InceptionV3': (InceptionV3, (299, 299, 3)),
10    'EfficientNetB0': (EfficientNetB0, (224, 224, 3)),
11}
12
13# Load bất kỳ model nào
14def load_pretrained(name, include_top=False):
15    ModelClass, input_shape = models_info[name]
16    return ModelClass(
17        weights='imagenet',
18        include_top=include_top,
19        input_shape=input_shape
20    )
21
22# Ví dụ
23resnet = load_pretrained('ResNet50')
24print(f"Output shape: {resnet.output_shape}")

Bài tiếp theo

Chúng ta sẽ học Ứng dụng CNN:

Image Classification
Object Detection
Transfer Learning thực hành

🎉 Tuyệt vời! Bạn đã nắm được lịch sử và kiến trúc các CNN models quan trọng!

Task 8