Lý thuyết
40 phút
Bài 4/5

Convolutional Neural Networks

Tìm hiểu CNN - kiến trúc mạnh mẽ cho Computer Vision

🖼️ Convolutional Neural Networks (CNN)

CNN là kiến trúc deep learning được thiết kế đặc biệt cho image processing. Bài này sẽ cover từ concepts cơ bản đến implementation với PyTorch.

Tại sao cần CNN?

Vấn đề với Fully Connected Networks

Limitations của FC Networks cho Images
  • Too many parameters: Image 224x224x3 = 150,528 inputs
  • Mất spatial information: Flatten image thành vector
  • Không scale: Lớn hơn image = exponentially more params

CNN giải quyết bằng:

  1. Local connectivity: Mỗi neuron chỉ connect với một vùng nhỏ
  2. Weight sharing: Cùng filter áp dụng toàn image
  3. Hierarchical features: Low → Mid → High level features

Core Components

1. Convolution Layer

Diagram
graph LR
    I[Input Image] --> F[Filter/Kernel]
    F --> FM[Feature Map]

Convolution operation:

Output[i,j]=mnInput[i+m,j+n]×Kernel[m,n]Output[i,j] = \sum_{m}\sum_{n} Input[i+m, j+n] \times Kernel[m,n]

Python
1import torch
2import torch.nn as nn
3
4# Convolution layer
5conv = nn.Conv2d(
6 in_channels=3, # RGB image
7 out_channels=32, # 32 filters
8 kernel_size=3, # 3x3 filter
9 stride=1, # Step size
10 padding=1 # Zero padding
11)
12
13# Example
14x = torch.randn(1, 3, 224, 224) # Batch, Channels, H, W
15out = conv(x) # Shape: (1, 32, 224, 224)

2. Pooling Layer

Giảm spatial dimensions:

Python
1# Max Pooling
2pool = nn.MaxPool2d(kernel_size=2, stride=2)
3
4# Input: (1, 32, 224, 224)
5# Output: (1, 32, 112, 112)
6
7# Average Pooling
8avg_pool = nn.AvgPool2d(kernel_size=2)

3. Fully Connected Layer

Sau các conv layers, flatten và đưa qua FC:

Python
1# Flatten
2flatten = nn.Flatten()
3
4# FC layer
5fc = nn.Linear(32 * 56 * 56, 10) # 10 classes

Build CNN từ đầu

Simple CNN for MNIST

Python
1import torch
2import torch.nn as nn
3import torch.nn.functional as F
4
5class SimpleCNN(nn.Module):
6 def __init__(self, num_classes=10):
7 super(SimpleCNN, self).__init__()
8
9 # Convolutional layers
10 self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
11 self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
12 self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
13
14 # Pooling
15 self.pool = nn.MaxPool2d(2, 2)
16
17 # Fully connected
18 self.fc1 = nn.Linear(128 * 3 * 3, 256)
19 self.fc2 = nn.Linear(256, num_classes)
20
21 # Dropout
22 self.dropout = nn.Dropout(0.5)
23
24 def forward(self, x):
25 # Conv block 1: 28x28 -> 14x14
26 x = self.pool(F.relu(self.conv1(x)))
27
28 # Conv block 2: 14x14 -> 7x7
29 x = self.pool(F.relu(self.conv2(x)))
30
31 # Conv block 3: 7x7 -> 3x3
32 x = self.pool(F.relu(self.conv3(x)))
33
34 # Flatten
35 x = x.view(x.size(0), -1)
36
37 # FC layers
38 x = F.relu(self.fc1(x))
39 x = self.dropout(x)
40 x = self.fc2(x)
41
42 return x
43
44# Create model
45model = SimpleCNN(num_classes=10)
46print(model)

Training Loop

Python
1import torch.optim as optim
2from torchvision import datasets, transforms
3from torch.utils.data import DataLoader
4
5# Data transforms
6transform = transforms.Compose([
7 transforms.ToTensor(),
8 transforms.Normalize((0.1307,), (0.3081,))
9])
10
11# Load MNIST
12train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
13test_dataset = datasets.MNIST('./data', train=False, transform=transform)
14
15train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
16test_loader = DataLoader(test_dataset, batch_size=64)
17
18# Model, Loss, Optimizer
19model = SimpleCNN()
20criterion = nn.CrossEntropyLoss()
21optimizer = optim.Adam(model.parameters(), lr=0.001)
22
23# Training
24device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
25model.to(device)
26
27def train_epoch(model, loader, criterion, optimizer):
28 model.train()
29 total_loss = 0
30 correct = 0
31
32 for batch_idx, (data, target) in enumerate(loader):
33 data, target = data.to(device), target.to(device)
34
35 optimizer.zero_grad()
36 output = model(data)
37 loss = criterion(output, target)
38 loss.backward()
39 optimizer.step()
40
41 total_loss += loss.item()
42 pred = output.argmax(dim=1)
43 correct += pred.eq(target).sum().item()
44
45 return total_loss / len(loader), correct / len(loader.dataset)
46
47def evaluate(model, loader):
48 model.eval()
49 correct = 0
50
51 with torch.no_grad():
52 for data, target in loader:
53 data, target = data.to(device), target.to(device)
54 output = model(data)
55 pred = output.argmax(dim=1)
56 correct += pred.eq(target).sum().item()
57
58 return correct / len(loader.dataset)
59
60# Train
61for epoch in range(10):
62 train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer)
63 test_acc = evaluate(model, test_loader)
64 print(f"Epoch {epoch+1}: Loss={train_loss:.4f}, Train Acc={train_acc:.2%}, Test Acc={test_acc:.2%}")

Famous CNN Architectures

1. LeNet-5 (1998)

Python
1class LeNet5(nn.Module):
2 def __init__(self):
3 super(LeNet5, self).__init__()
4 self.conv1 = nn.Conv2d(1, 6, 5)
5 self.conv2 = nn.Conv2d(6, 16, 5)
6 self.fc1 = nn.Linear(16 * 4 * 4, 120)
7 self.fc2 = nn.Linear(120, 84)
8 self.fc3 = nn.Linear(84, 10)
9
10 def forward(self, x):
11 x = F.max_pool2d(F.relu(self.conv1(x)), 2)
12 x = F.max_pool2d(F.relu(self.conv2(x)), 2)
13 x = x.view(-1, 16 * 4 * 4)
14 x = F.relu(self.fc1(x))
15 x = F.relu(self.fc2(x))
16 return self.fc3(x)

2. VGG16 (2014)

"Deeper is better" với 3x3 filters:

Python
1class VGG16(nn.Module):
2 def __init__(self, num_classes=1000):
3 super(VGG16, self).__init__()
4
5 self.features = nn.Sequential(
6 # Block 1
7 nn.Conv2d(3, 64, 3, padding=1), nn.ReLU(),
8 nn.Conv2d(64, 64, 3, padding=1), nn.ReLU(),
9 nn.MaxPool2d(2, 2),
10
11 # Block 2
12 nn.Conv2d(64, 128, 3, padding=1), nn.ReLU(),
13 nn.Conv2d(128, 128, 3, padding=1), nn.ReLU(),
14 nn.MaxPool2d(2, 2),
15
16 # Block 3
17 nn.Conv2d(128, 256, 3, padding=1), nn.ReLU(),
18 nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(),
19 nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(),
20 nn.MaxPool2d(2, 2),
21
22 # Block 4 & 5 similar...
23 )
24
25 self.classifier = nn.Sequential(
26 nn.Linear(512 * 7 * 7, 4096), nn.ReLU(), nn.Dropout(),
27 nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(),
28 nn.Linear(4096, num_classes)
29 )

3. ResNet (2015)

Skip connections giải quyết vanishing gradient:

Python
1class ResidualBlock(nn.Module):
2 def __init__(self, in_channels, out_channels, stride=1):
3 super(ResidualBlock, self).__init__()
4
5 self.conv1 = nn.Conv2d(in_channels, out_channels, 3, stride, 1, bias=False)
6 self.bn1 = nn.BatchNorm2d(out_channels)
7 self.conv2 = nn.Conv2d(out_channels, out_channels, 3, 1, 1, bias=False)
8 self.bn2 = nn.BatchNorm2d(out_channels)
9
10 # Skip connection
11 self.shortcut = nn.Sequential()
12 if stride != 1 or in_channels != out_channels:
13 self.shortcut = nn.Sequential(
14 nn.Conv2d(in_channels, out_channels, 1, stride, bias=False),
15 nn.BatchNorm2d(out_channels)
16 )
17
18 def forward(self, x):
19 out = F.relu(self.bn1(self.conv1(x)))
20 out = self.bn2(self.conv2(out))
21 out += self.shortcut(x) # Skip connection!
22 out = F.relu(out)
23 return out

Transfer Learning

Sử dụng pretrained models:

Python
1from torchvision import models
2
3# Load pretrained ResNet50
4model = models.resnet50(pretrained=True)
5
6# Freeze all layers
7for param in model.parameters():
8 param.requires_grad = False
9
10# Replace final layer
11num_classes = 10
12model.fc = nn.Linear(model.fc.in_features, num_classes)
13
14# Only train the new fc layer
15optimizer = optim.Adam(model.fc.parameters(), lr=0.001)

Data Augmentation

Tăng data đa dạng:

Python
1train_transform = transforms.Compose([
2 transforms.RandomResizedCrop(224),
3 transforms.RandomHorizontalFlip(),
4 transforms.RandomRotation(15),
5 transforms.ColorJitter(brightness=0.2, contrast=0.2),
6 transforms.ToTensor(),
7 transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
8])

Bài tập thực hành

Hands-on Exercise

Build Image Classifier với CNN:

  1. Dataset: CIFAR-10 (10 classes, 32x32 images)
  2. Build custom CNN với:
    • 3-4 conv blocks
    • Batch normalization
    • Dropout
  3. Train và evaluate
  4. Thử Transfer Learning với ResNet18

Target: > 85% accuracy trên CIFAR-10

Python
1# Starter code
2from torchvision.datasets import CIFAR10
3
4train_data = CIFAR10('./data', train=True, download=True, transform=train_transform)
5test_data = CIFAR10('./data', train=False, transform=test_transform)
6
7# TODO: Build your CNN
8# TODO: Train and evaluate

Tiếp theo

Trong bài tiếp theo, chúng ta sẽ học về Recurrent Neural Networks (RNN) cho sequential data.


Tài liệu tham khảo