🔗 Neural Networks từ A-Z

Trong bài này, chúng ta sẽ build Neural Network từ đầu để hiểu rõ cách chúng hoạt động.

Neuron - Đơn vị cơ bản

Biological vs Artificial Neuron

Diagram

graph LR
    subgraph "Biological Neuron"
        D1[Dendrites] --> S1[Soma]
        S1 --> A1[Axon]
    end
    
    subgraph "Artificial Neuron"
        X1[x₁] --> |w₁| N[Σ + f]
        X2[x₂] --> |w₂| N
        X3[x₃] --> |w₃| N
        B[bias] --> N
        N --> Y[output]
    end

Perceptron

Perceptron là neural network đơn giản nhất:

$y = f(\sum_{i=1}^{n} w_i x_i + b)$

Trong đó:

$x_i$ : inputs
$w_i$ : weights
$b$ : bias
$f$ : activation function

Python

1import numpy as np
2
3class Perceptron:
4    def __init__(self, n_inputs):
5        # Initialize random weights
6        self.weights = np.random.randn(n_inputs)
7        self.bias = np.random.randn()
8    
9    def forward(self, x):
10        # Weighted sum + bias
11        z = np.dot(x, self.weights) + self.bias
12        # Step activation
13        return 1 if z > 0 else 0
14    
15    def train(self, X, y, learning_rate=0.1, epochs=100):
16        for _ in range(epochs):
17            for xi, yi in zip(X, y):
18                pred = self.forward(xi)
19                error = yi - pred
20                # Update weights
21                self.weights += learning_rate * error * xi
22                self.bias += learning_rate * error

Activation Functions

Activation functions thêm non-linearity vào network:

1. Sigmoid

$\sigma(z) = \frac{1}{1 + e^{-z}}$

Python

1def sigmoid(z):
2    return 1 / (1 + np.exp(-z))
3
4def sigmoid_derivative(z):
5    s = sigmoid(z)
6    return s * (1 - s)

Pros: Output trong [0, 1], smooth gradient Cons: Vanishing gradient, not zero-centered

2. ReLU (Rectified Linear Unit)

$ReLU(z) = max(0, z)$

Python

1def relu(z):
2    return np.maximum(0, z)
3
4def relu_derivative(z):
5    return (z > 0).astype(float)

Pros: Fast, no vanishing gradient Cons: Dead neurons (negative values → 0)

3. Tanh

$tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}$

Python

1def tanh(z):
2    return np.tanh(z)
3
4def tanh_derivative(z):
5    return 1 - np.tanh(z)**2

Pros: Zero-centered, stronger gradients Cons: Still has vanishing gradient

4. Softmax (for classification)

$softmax(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}$

Python

1def softmax(z):
2    exp_z = np.exp(z - np.max(z))  # Numerical stability
3    return exp_z / exp_z.sum()

Multi-Layer Perceptron (MLP)

Architecture

Diagram

graph LR
    subgraph Input
        I1((x₁))
        I2((x₂))
        I3((x₃))
    end
    
    subgraph Hidden1
        H1((h₁))
        H2((h₂))
        H3((h₃))
        H4((h₄))
    end
    
    subgraph Hidden2
        H5((h₅))
        H6((h₆))
    end
    
    subgraph Output
        O1((y₁))
        O2((y₂))
    end
    
    I1 & I2 & I3 --> H1 & H2 & H3 & H4
    H1 & H2 & H3 & H4 --> H5 & H6
    H5 & H6 --> O1 & O2

Implementation từ đầu

Python

1import numpy as np
2
3class NeuralNetwork:
4    def __init__(self, layer_sizes):
5        """
6        layer_sizes: list of layer sizes [input, hidden1, hidden2, ..., output]
7        Example: [784, 128, 64, 10] for MNIST
8        """
9        self.layers = len(layer_sizes)
10        self.weights = []
11        self.biases = []
12        
13        # Initialize weights với Xavier initialization
14        for i in range(self.layers - 1):
15            w = np.random.randn(layer_sizes[i], layer_sizes[i+1]) * np.sqrt(2 / layer_sizes[i])
16            b = np.zeros((1, layer_sizes[i+1]))
17            self.weights.append(w)
18            self.biases.append(b)
19    
20    def forward(self, X):
21        """Forward propagation"""
22        self.activations = [X]
23        self.z_values = []
24        
25        A = X
26        for i in range(self.layers - 1):
27            Z = np.dot(A, self.weights[i]) + self.biases[i]
28            self.z_values.append(Z)
29            
30            # ReLU for hidden layers, Softmax for output
31            if i < self.layers - 2:
32                A = np.maximum(0, Z)  # ReLU
33            else:
34                A = self.softmax(Z)   # Softmax
35            
36            self.activations.append(A)
37        
38        return A
39    
40    def softmax(self, z):
41        exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
42        return exp_z / np.sum(exp_z, axis=1, keepdims=True)
43    
44    def backward(self, X, y, learning_rate=0.01):
45        """Backpropagation"""
46        m = X.shape[0]
47        
48        # Output layer gradient
49        dA = self.activations[-1] - y  # Cross-entropy derivative
50        
51        for i in range(self.layers - 2, -1, -1):
52            # Gradient for weights and biases
53            dW = np.dot(self.activations[i].T, dA) / m
54            db = np.sum(dA, axis=0, keepdims=True) / m
55            
56            if i > 0:
57                # Gradient for previous layer
58                dA = np.dot(dA, self.weights[i].T)
59                dA *= (self.z_values[i-1] > 0)  # ReLU derivative
60            
61            # Update weights
62            self.weights[i] -= learning_rate * dW
63            self.biases[i] -= learning_rate * db
64    
65    def train(self, X, y, epochs=100, learning_rate=0.01, batch_size=32):
66        """Training loop"""
67        for epoch in range(epochs):
68            # Mini-batch training
69            indices = np.random.permutation(X.shape[0])
70            
71            for i in range(0, X.shape[0], batch_size):
72                batch_idx = indices[i:i+batch_size]
73                X_batch = X[batch_idx]
74                y_batch = y[batch_idx]
75                
76                # Forward + Backward
77                self.forward(X_batch)
78                self.backward(X_batch, y_batch, learning_rate)
79            
80            # Log progress
81            if epoch % 10 == 0:
82                loss = self.cross_entropy_loss(X, y)
83                acc = self.accuracy(X, y)
84                print(f"Epoch {epoch}: Loss={loss:.4f}, Accuracy={acc:.2%}")
85    
86    def cross_entropy_loss(self, X, y):
87        pred = self.forward(X)
88        return -np.mean(np.sum(y * np.log(pred + 1e-8), axis=1))
89    
90    def accuracy(self, X, y):
91        pred = self.forward(X)
92        return np.mean(np.argmax(pred, axis=1) == np.argmax(y, axis=1))
93    
94    def predict(self, X):
95        return np.argmax(self.forward(X), axis=1)

Sử dụng với MNIST

Python

1from sklearn.datasets import fetch_openml
2from sklearn.model_selection import train_test_split
3from sklearn.preprocessing import OneHotEncoder
4
5# Load MNIST
6mnist = fetch_openml('mnist_784', version=1)
7X = mnist.data.values / 255.0  # Normalize
8y = mnist.target.values.astype(int)
9
10# One-hot encode
11y_onehot = np.eye(10)[y]
12
13# Split
14X_train, X_test, y_train, y_test = train_test_split(
15    X, y_onehot, test_size=0.2, random_state=42
16)
17
18# Create and train network
19nn = NeuralNetwork([784, 128, 64, 10])
20nn.train(X_train, y_train, epochs=50, learning_rate=0.1, batch_size=64)
21
22# Evaluate
23print(f"Test Accuracy: {nn.accuracy(X_test, y_test):.2%}")

Loss Functions

1. Mean Squared Error (Regression)

$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$

Python

1def mse_loss(y_true, y_pred):
2    return np.mean((y_true - y_pred) ** 2)

2. Binary Cross-Entropy (Binary Classification)

$BCE = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i)]$

Python

1def binary_cross_entropy(y_true, y_pred):
2    return -np.mean(y_true * np.log(y_pred + 1e-8) + 
3                    (1 - y_true) * np.log(1 - y_pred + 1e-8))

3. Categorical Cross-Entropy (Multi-class)

$CCE = -\sum_{i} y_i \log(\hat{y}_i)$

Python

1def categorical_cross_entropy(y_true, y_pred):
2    return -np.mean(np.sum(y_true * np.log(y_pred + 1e-8), axis=1))

Regularization Techniques

1. L2 Regularization (Weight Decay)

$Loss_{total} = Loss_{original} + \lambda \sum w^2$

Python

1def l2_regularization(weights, lambda_=0.01):
2    return lambda_ * sum(np.sum(w**2) for w in weights)

2. Dropout

Randomly "drop" neurons during training:

Python

1def dropout(X, dropout_rate=0.5, training=True):
2    if not training:
3        return X
4    mask = np.random.binomial(1, 1-dropout_rate, X.shape)
5    return X * mask / (1 - dropout_rate)  # Scale up

3. Batch Normalization

Normalize activations trong mỗi mini-batch:

Python

1def batch_norm(X, gamma=1, beta=0, epsilon=1e-8):
2    mean = np.mean(X, axis=0)
3    var = np.var(X, axis=0)
4    X_norm = (X - mean) / np.sqrt(var + epsilon)
5    return gamma * X_norm + beta

Bài tập thực hành

Hands-on Exercise

Build Neural Network từ đầu:

Implement class NeuralNetwork ở trên
Train trên MNIST dataset
Thử thay đổi:
- Số layers và neurons
- Learning rate
- Activation functions
So sánh kết quả và vẽ learning curves

Target: Đạt > 95% accuracy trên MNIST

Trong bài tiếp theo, chúng ta sẽ học về Convolutional Neural Networks (CNN) - kiến trúc mạnh mẽ cho computer vision.

Neural Networks từ A-Z

🔗 Neural Networks từ A-Z

Neuron - Đơn vị cơ bản

Biological vs Artificial Neuron

Perceptron

Activation Functions

1. Sigmoid

2. ReLU (Rectified Linear Unit)

3. Tanh

4. Softmax (for classification)

Multi-Layer Perceptron (MLP)

Architecture

Implementation từ đầu

Sử dụng với MNIST

Loss Functions

1. Mean Squared Error (Regression)

2. Binary Cross-Entropy (Binary Classification)

3. Categorical Cross-Entropy (Multi-class)

Regularization Techniques

1. L2 Regularization (Weight Decay)

2. Dropout

3. Batch Normalization

Bài tập thực hành

Tiếp theo

Tài liệu tham khảo