🖼️ Convolutional Neural Networks (CNN)
CNN là kiến trúc deep learning được thiết kế đặc biệt cho image processing. Bài này sẽ cover từ concepts cơ bản đến implementation với PyTorch.
Tại sao cần CNN?
Vấn đề với Fully Connected Networks
Limitations của FC Networks cho Images
- Too many parameters: Image 224x224x3 = 150,528 inputs
- Mất spatial information: Flatten image thành vector
- Không scale: Lớn hơn image = exponentially more params
CNN giải quyết bằng:
- Local connectivity: Mỗi neuron chỉ connect với một vùng nhỏ
- Weight sharing: Cùng filter áp dụng toàn image
- Hierarchical features: Low → Mid → High level features
Core Components
1. Convolution Layer
Diagram
graph LR
I[Input Image] --> F[Filter/Kernel]
F --> FM[Feature Map]Convolution operation:
Python
1import torch2import torch.nn as nn34# Convolution layer5conv = nn.Conv2d(6 in_channels=3, # RGB image7 out_channels=32, # 32 filters8 kernel_size=3, # 3x3 filter9 stride=1, # Step size10 padding=1 # Zero padding11)1213# Example14x = torch.randn(1, 3, 224, 224) # Batch, Channels, H, W15out = conv(x) # Shape: (1, 32, 224, 224)2. Pooling Layer
Giảm spatial dimensions:
Python
1# Max Pooling2pool = nn.MaxPool2d(kernel_size=2, stride=2)34# Input: (1, 32, 224, 224)5# Output: (1, 32, 112, 112)67# Average Pooling8avg_pool = nn.AvgPool2d(kernel_size=2)3. Fully Connected Layer
Sau các conv layers, flatten và đưa qua FC:
Python
1# Flatten2flatten = nn.Flatten()34# FC layer5fc = nn.Linear(32 * 56 * 56, 10) # 10 classesBuild CNN từ đầu
Simple CNN for MNIST
Python
1import torch2import torch.nn as nn3import torch.nn.functional as F45class SimpleCNN(nn.Module):6 def __init__(self, num_classes=10):7 super(SimpleCNN, self).__init__()8 9 # Convolutional layers10 self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)11 self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)12 self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)13 14 # Pooling15 self.pool = nn.MaxPool2d(2, 2)16 17 # Fully connected18 self.fc1 = nn.Linear(128 * 3 * 3, 256)19 self.fc2 = nn.Linear(256, num_classes)20 21 # Dropout22 self.dropout = nn.Dropout(0.5)23 24 def forward(self, x):25 # Conv block 1: 28x28 -> 14x1426 x = self.pool(F.relu(self.conv1(x)))27 28 # Conv block 2: 14x14 -> 7x729 x = self.pool(F.relu(self.conv2(x)))30 31 # Conv block 3: 7x7 -> 3x332 x = self.pool(F.relu(self.conv3(x)))33 34 # Flatten35 x = x.view(x.size(0), -1)36 37 # FC layers38 x = F.relu(self.fc1(x))39 x = self.dropout(x)40 x = self.fc2(x)41 42 return x4344# Create model45model = SimpleCNN(num_classes=10)46print(model)Training Loop
Python
1import torch.optim as optim2from torchvision import datasets, transforms3from torch.utils.data import DataLoader45# Data transforms6transform = transforms.Compose([7 transforms.ToTensor(),8 transforms.Normalize((0.1307,), (0.3081,))9])1011# Load MNIST12train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)13test_dataset = datasets.MNIST('./data', train=False, transform=transform)1415train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)16test_loader = DataLoader(test_dataset, batch_size=64)1718# Model, Loss, Optimizer19model = SimpleCNN()20criterion = nn.CrossEntropyLoss()21optimizer = optim.Adam(model.parameters(), lr=0.001)2223# Training24device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')25model.to(device)2627def train_epoch(model, loader, criterion, optimizer):28 model.train()29 total_loss = 030 correct = 031 32 for batch_idx, (data, target) in enumerate(loader):33 data, target = data.to(device), target.to(device)34 35 optimizer.zero_grad()36 output = model(data)37 loss = criterion(output, target)38 loss.backward()39 optimizer.step()40 41 total_loss += loss.item()42 pred = output.argmax(dim=1)43 correct += pred.eq(target).sum().item()44 45 return total_loss / len(loader), correct / len(loader.dataset)4647def evaluate(model, loader):48 model.eval()49 correct = 050 51 with torch.no_grad():52 for data, target in loader:53 data, target = data.to(device), target.to(device)54 output = model(data)55 pred = output.argmax(dim=1)56 correct += pred.eq(target).sum().item()57 58 return correct / len(loader.dataset)5960# Train61for epoch in range(10):62 train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer)63 test_acc = evaluate(model, test_loader)64 print(f"Epoch {epoch+1}: Loss={train_loss:.4f}, Train Acc={train_acc:.2%}, Test Acc={test_acc:.2%}")Famous CNN Architectures
1. LeNet-5 (1998)
Python
1class LeNet5(nn.Module):2 def __init__(self):3 super(LeNet5, self).__init__()4 self.conv1 = nn.Conv2d(1, 6, 5)5 self.conv2 = nn.Conv2d(6, 16, 5)6 self.fc1 = nn.Linear(16 * 4 * 4, 120)7 self.fc2 = nn.Linear(120, 84)8 self.fc3 = nn.Linear(84, 10)9 10 def forward(self, x):11 x = F.max_pool2d(F.relu(self.conv1(x)), 2)12 x = F.max_pool2d(F.relu(self.conv2(x)), 2)13 x = x.view(-1, 16 * 4 * 4)14 x = F.relu(self.fc1(x))15 x = F.relu(self.fc2(x))16 return self.fc3(x)2. VGG16 (2014)
"Deeper is better" với 3x3 filters:
Python
1class VGG16(nn.Module):2 def __init__(self, num_classes=1000):3 super(VGG16, self).__init__()4 5 self.features = nn.Sequential(6 # Block 17 nn.Conv2d(3, 64, 3, padding=1), nn.ReLU(),8 nn.Conv2d(64, 64, 3, padding=1), nn.ReLU(),9 nn.MaxPool2d(2, 2),10 11 # Block 212 nn.Conv2d(64, 128, 3, padding=1), nn.ReLU(),13 nn.Conv2d(128, 128, 3, padding=1), nn.ReLU(),14 nn.MaxPool2d(2, 2),15 16 # Block 317 nn.Conv2d(128, 256, 3, padding=1), nn.ReLU(),18 nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(),19 nn.Conv2d(256, 256, 3, padding=1), nn.ReLU(),20 nn.MaxPool2d(2, 2),21 22 # Block 4 & 5 similar...23 )24 25 self.classifier = nn.Sequential(26 nn.Linear(512 * 7 * 7, 4096), nn.ReLU(), nn.Dropout(),27 nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(),28 nn.Linear(4096, num_classes)29 )3. ResNet (2015)
Skip connections giải quyết vanishing gradient:
Python
1class ResidualBlock(nn.Module):2 def __init__(self, in_channels, out_channels, stride=1):3 super(ResidualBlock, self).__init__()4 5 self.conv1 = nn.Conv2d(in_channels, out_channels, 3, stride, 1, bias=False)6 self.bn1 = nn.BatchNorm2d(out_channels)7 self.conv2 = nn.Conv2d(out_channels, out_channels, 3, 1, 1, bias=False)8 self.bn2 = nn.BatchNorm2d(out_channels)9 10 # Skip connection11 self.shortcut = nn.Sequential()12 if stride != 1 or in_channels != out_channels:13 self.shortcut = nn.Sequential(14 nn.Conv2d(in_channels, out_channels, 1, stride, bias=False),15 nn.BatchNorm2d(out_channels)16 )17 18 def forward(self, x):19 out = F.relu(self.bn1(self.conv1(x)))20 out = self.bn2(self.conv2(out))21 out += self.shortcut(x) # Skip connection!22 out = F.relu(out)23 return outTransfer Learning
Sử dụng pretrained models:
Python
1from torchvision import models23# Load pretrained ResNet504model = models.resnet50(pretrained=True)56# Freeze all layers7for param in model.parameters():8 param.requires_grad = False910# Replace final layer11num_classes = 1012model.fc = nn.Linear(model.fc.in_features, num_classes)1314# Only train the new fc layer15optimizer = optim.Adam(model.fc.parameters(), lr=0.001)Data Augmentation
Tăng data đa dạng:
Python
1train_transform = transforms.Compose([2 transforms.RandomResizedCrop(224),3 transforms.RandomHorizontalFlip(),4 transforms.RandomRotation(15),5 transforms.ColorJitter(brightness=0.2, contrast=0.2),6 transforms.ToTensor(),7 transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])8])Bài tập thực hành
Hands-on Exercise
Build Image Classifier với CNN:
- Dataset: CIFAR-10 (10 classes, 32x32 images)
- Build custom CNN với:
- 3-4 conv blocks
- Batch normalization
- Dropout
- Train và evaluate
- Thử Transfer Learning với ResNet18
Target: > 85% accuracy trên CIFAR-10
Python
1# Starter code2from torchvision.datasets import CIFAR1034train_data = CIFAR10('./data', train=True, download=True, transform=train_transform)5test_data = CIFAR10('./data', train=False, transform=test_transform)67# TODO: Build your CNN8# TODO: Train and evaluateTiếp theo
Trong bài tiếp theo, chúng ta sẽ học về Recurrent Neural Networks (RNN) cho sequential data.
