🖼️ Convolutional Neural Networks (CNN)

CNN là kiến trúc deep learning được thiết kế đặc biệt cho image processing. Bài này sẽ cover từ concepts cơ bản đến implementation với PyTorch.

Tại sao cần CNN?

Vấn đề với Fully Connected Networks

Limitations của FC Networks cho Images

Too many parameters: Image 224x224x3 = 150,528 inputs
Mất spatial information: Flatten image thành vector
Không scale: Lớn hơn image = exponentially more params

CNN giải quyết bằng:

Local connectivity: Mỗi neuron chỉ connect với một vùng nhỏ
Weight sharing: Cùng filter áp dụng toàn image
Hierarchical features: Low → Mid → High level features

Core Components

1. Convolution Layer

Diagram

graph LR
    I[Input Image] --> F[Filter/Kernel]
    F --> FM[Feature Map]

Convolution operation:

$Output[i,j] = \sum_{m}\sum_{n} Input[i+m, j+n] \times Kernel[m,n]$

Python

1import torch
2import torch.nn as nn
3
4# Convolution layer
5conv = nn.Conv2d(
6    in_channels=3,      # RGB image
7    out_channels=32,    # 32 filters
8    kernel_size=3,      # 3x3 filter
9    stride=1,           # Step size
10    padding=1           # Zero padding
11)
12
13# Example
14x = torch.randn(1, 3, 224, 224)  # Batch, Channels, H, W
15out = conv(x)  # Shape: (1, 32, 224, 224)

2. Pooling Layer

Giảm spatial dimensions:

Python

1# Max Pooling
2pool = nn.MaxPool2d(kernel_size=2, stride=2)
3
4# Input: (1, 32, 224, 224)
5# Output: (1, 32, 112, 112)
6
7# Average Pooling
8avg_pool = nn.AvgPool2d(kernel_size=2)

3. Fully Connected Layer

Sau các conv layers, flatten và đưa qua FC:

Python

1# Flatten
2flatten = nn.Flatten()
3
4# FC layer
5fc = nn.Linear(32 * 56 * 56, 10)  # 10 classes

Build CNN từ đầu

Simple CNN for MNIST

Python

1import torch
2import torch.nn as nn
3import torch.nn.functional as F
4
5class SimpleCNN(nn.Module):
6    def __init__(self, num_classes=10):
7        super(SimpleCNN, self).__init__()
8        
9        # Convolutional layers
10        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
11        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
12        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
13        
14        # Pooling
15        self.pool = nn.MaxPool2d(2, 2)
16        
17        # Fully connected
18        self.fc1 = nn.Linear(128 * 3 * 3, 256)
19        self.fc2 = nn.Linear(256, num_classes)
20        
21        # Dropout
22        self.dropout = nn.Dropout(0.5)
23    
24    def forward(self, x):
25        # Conv block 1: 28x28 -> 14x14
26        x = self.pool(F.relu(self.conv1(x)))
27        
28        # Conv block 2: 14x14 -> 7x7
29        x = self.pool(F.relu(self.conv2(x)))
30        
31        # Conv block 3: 7x7 -> 3x3
32        x = self.pool(F.relu(self.conv3(x)))
33        
34        # Flatten
35        x = x.view(x.size(0), -1)
36        
37        # FC layers
38        x = F.relu(self.fc1(x))
39        x = self.dropout(x)
40        x = self.fc2(x)
41        
42        return x
43
44# Create model
45model = SimpleCNN(num_classes=10)
46print(model)

Training Loop

Python

1import torch.optim as optim
2from torchvision import datasets, transforms
3from torch.utils.data import DataLoader
4
5# Data transforms
6transform = transforms.Compose([
7    transforms.ToTensor(),
8    transforms.Normalize((0.1307,), (0.3081,))
9])
10
11# Load MNIST
12train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
13test_dataset = datasets.MNIST('./data', train=False, transform=transform)
14
15train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
16test_loader = DataLoader(test_dataset, batch_size=64)
17
18# Model, Loss, Optimizer
19model = SimpleCNN()
20criterion = nn.CrossEntropyLoss()
21optimizer = optim.Adam(model.parameters(), lr=0.001)
22
23# Training
24device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
25model.to(device)
26
27def train_epoch(model, loader, criterion, optimizer):
28    model.train()
29    total_loss = 0
30    correct = 0
31    
32    for batch_idx, (data, target) in enumerate(loader):
33        data, target = data.to(device), target.to(device)
34        
35        optimizer.zero_grad()
36        output = model(data)
37        loss = criterion(output, target)
38        loss.backward()
39        optimizer.step()
40        
41        total_loss += loss.item()
42        pred = output.argmax(dim=1)
43        correct += pred.eq(target).sum().item()
44    
45    return total_loss / len(loader), correct / len(loader.dataset)
46
47def evaluate(model, loader):
48    model.eval()
49    correct = 0
50    
51    with torch.no_grad():
52        for data, target in loader:
53            data, target = data.to(device), target.to(device)
54            output = model(data)
55            pred = output.argmax(dim=1)
56            correct += pred.eq(target).sum().item()
57    
58    return correct / len(loader.dataset)
59
60# Train
61for epoch in range(10):
62    train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer)
63    test_acc = evaluate(model, test_loader)
64    print(f"Epoch {epoch+1}: Loss={train_loss:.4f}, Train Acc={train_acc:.2%}, Test Acc={test_acc:.2%}")

Famous CNN Architectures

1. LeNet-5 (1998)

Python

1class LeNet5(nn.Module):
2    def __init__(self):
3        super(LeNet5, self).__init__()
4        self.conv1 = nn.Conv2d(1, 6, 5)
5        self.conv2 = nn.Conv2d(6, 16, 5)
6        self.fc1 = nn.Linear(16 * 4 * 4, 120)
7        self.fc2 = nn.Linear(120, 84)
8        self.fc3 = nn.Linear(84, 10)
9    
10    def forward(self, x):
11        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
12        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
13        x = x.view(-1, 16 * 4 * 4)
14        x = F.relu(self.fc1(x))
15        x = F.relu(self.fc2(x))
16        return self.fc3(x)

2. VGG16 (2014)

"Deeper is better" với 3x3 filters:

Python

1class VGG16(nn.Module):
2    def __init__(self, num_classes=1000):
3        super(VGG16, self).__init__()
4        
5        self.features = nn.Sequential(
6            # Block 1
7            nn.Conv2d(3, 64, 3, padding=1), nn.ReLU(),
8            nn.Conv2d(64, 64, 3, padding=1), nn.ReLU(),
9            nn.MaxPool2d(2, 2),
10            
11            # Block 2
12            nn.Conv2d(64, 128, 3, padding=1), nn.ReLU(),
13            nn.Conv2d(128, 128, 3, padding=1), nn.ReLU(),
14            nn.MaxPool2d(2, 2),
15            
16            # Block 3
17            nn.Conv2d(128, 256, 3, padding=1), nn.ReLU(),
18            nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(),
19            nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(),
20            nn.MaxPool2d(2, 2),
21            
22            # Block 4 & 5 similar...
23        )
24        
25        self.classifier = nn.Sequential(
26            nn.Linear(512 * 7 * 7, 4096), nn.ReLU(), nn.Dropout(),
27            nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(),
28            nn.Linear(4096, num_classes)
29        )

3. ResNet (2015)

Skip connections giải quyết vanishing gradient:

Python

1class ResidualBlock(nn.Module):
2    def __init__(self, in_channels, out_channels, stride=1):
3        super(ResidualBlock, self).__init__()
4        
5        self.conv1 = nn.Conv2d(in_channels, out_channels, 3, stride, 1, bias=False)
6        self.bn1 = nn.BatchNorm2d(out_channels)
7        self.conv2 = nn.Conv2d(out_channels, out_channels, 3, 1, 1, bias=False)
8        self.bn2 = nn.BatchNorm2d(out_channels)
9        
10        # Skip connection
11        self.shortcut = nn.Sequential()
12        if stride != 1 or in_channels != out_channels:
13            self.shortcut = nn.Sequential(
14                nn.Conv2d(in_channels, out_channels, 1, stride, bias=False),
15                nn.BatchNorm2d(out_channels)
16            )
17    
18    def forward(self, x):
19        out = F.relu(self.bn1(self.conv1(x)))
20        out = self.bn2(self.conv2(out))
21        out += self.shortcut(x)  # Skip connection!
22        out = F.relu(out)
23        return out

Transfer Learning

Sử dụng pretrained models:

Python

1from torchvision import models
2
3# Load pretrained ResNet50
4model = models.resnet50(pretrained=True)
5
6# Freeze all layers
7for param in model.parameters():
8    param.requires_grad = False
9
10# Replace final layer
11num_classes = 10
12model.fc = nn.Linear(model.fc.in_features, num_classes)
13
14# Only train the new fc layer
15optimizer = optim.Adam(model.fc.parameters(), lr=0.001)

Data Augmentation

Tăng data đa dạng:

Python

1train_transform = transforms.Compose([
2    transforms.RandomResizedCrop(224),
3    transforms.RandomHorizontalFlip(),
4    transforms.RandomRotation(15),
5    transforms.ColorJitter(brightness=0.2, contrast=0.2),
6    transforms.ToTensor(),
7    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
8])

Bài tập thực hành

Hands-on Exercise

Build Image Classifier với CNN:

Dataset: CIFAR-10 (10 classes, 32x32 images)
Build custom CNN với:
- 3-4 conv blocks
- Batch normalization
- Dropout
Train và evaluate
Thử Transfer Learning với ResNet18

Target: > 85% accuracy trên CIFAR-10

Python

1# Starter code
2from torchvision.datasets import CIFAR10
3
4train_data = CIFAR10('./data', train=True, download=True, transform=train_transform)
5test_data = CIFAR10('./data', train=False, transform=test_transform)
6
7# TODO: Build your CNN
8# TODO: Train and evaluate

Trong bài tiếp theo, chúng ta sẽ học về Recurrent Neural Networks (RNN) cho sequential data.

Convolutional Neural Networks

🖼️ Convolutional Neural Networks (CNN)

Tại sao cần CNN?

Vấn đề với Fully Connected Networks

CNN giải quyết bằng:

Core Components

1. Convolution Layer

2. Pooling Layer

3. Fully Connected Layer

Build CNN từ đầu

Simple CNN for MNIST

Training Loop

Famous CNN Architectures

1. LeNet-5 (1998)

2. VGG16 (2014)

3. ResNet (2015)

Transfer Learning

Data Augmentation

Bài tập thực hành

Tiếp theo

Tài liệu tham khảo