Lý thuyết
35 phút
Bài 3/5

Neural Networks từ A-Z

Hiểu sâu về Neural Networks - từ Perceptron đến Deep Networks

🔗 Neural Networks từ A-Z

Trong bài này, chúng ta sẽ build Neural Network từ đầu để hiểu rõ cách chúng hoạt động.

Neuron - Đơn vị cơ bản

Biological vs Artificial Neuron

Diagram
graph LR
    subgraph "Biological Neuron"
        D1[Dendrites] --> S1[Soma]
        S1 --> A1[Axon]
    end
    
    subgraph "Artificial Neuron"
        X1[x₁] --> |w₁| N[Σ + f]
        X2[x₂] --> |w₂| N
        X3[x₃] --> |w₃| N
        B[bias] --> N
        N --> Y[output]
    end

Perceptron

Perceptron là neural network đơn giản nhất:

y=f(i=1nwixi+b)y = f(\sum_{i=1}^{n} w_i x_i + b)

Trong đó:

  • xix_i: inputs
  • wiw_i: weights
  • bb: bias
  • ff: activation function
Python
1import numpy as np
2
3class Perceptron:
4 def __init__(self, n_inputs):
5 # Initialize random weights
6 self.weights = np.random.randn(n_inputs)
7 self.bias = np.random.randn()
8
9 def forward(self, x):
10 # Weighted sum + bias
11 z = np.dot(x, self.weights) + self.bias
12 # Step activation
13 return 1 if z > 0 else 0
14
15 def train(self, X, y, learning_rate=0.1, epochs=100):
16 for _ in range(epochs):
17 for xi, yi in zip(X, y):
18 pred = self.forward(xi)
19 error = yi - pred
20 # Update weights
21 self.weights += learning_rate * error * xi
22 self.bias += learning_rate * error

Activation Functions

Activation functions thêm non-linearity vào network:

1. Sigmoid

σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}

Python
1def sigmoid(z):
2 return 1 / (1 + np.exp(-z))
3
4def sigmoid_derivative(z):
5 s = sigmoid(z)
6 return s * (1 - s)

Pros: Output trong [0, 1], smooth gradient Cons: Vanishing gradient, not zero-centered

2. ReLU (Rectified Linear Unit)

ReLU(z)=max(0,z)ReLU(z) = max(0, z)

Python
1def relu(z):
2 return np.maximum(0, z)
3
4def relu_derivative(z):
5 return (z > 0).astype(float)

Pros: Fast, no vanishing gradient Cons: Dead neurons (negative values → 0)

3. Tanh

tanh(z)=ezezez+eztanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}

Python
1def tanh(z):
2 return np.tanh(z)
3
4def tanh_derivative(z):
5 return 1 - np.tanh(z)**2

Pros: Zero-centered, stronger gradients Cons: Still has vanishing gradient

4. Softmax (for classification)

softmax(zi)=ezijezjsoftmax(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}

Python
1def softmax(z):
2 exp_z = np.exp(z - np.max(z)) # Numerical stability
3 return exp_z / exp_z.sum()

Multi-Layer Perceptron (MLP)

Architecture

Diagram
graph LR
    subgraph Input
        I1((x₁))
        I2((x₂))
        I3((x₃))
    end
    
    subgraph Hidden1
        H1((h₁))
        H2((h₂))
        H3((h₃))
        H4((h₄))
    end
    
    subgraph Hidden2
        H5((h₅))
        H6((h₆))
    end
    
    subgraph Output
        O1((y₁))
        O2((y₂))
    end
    
    I1 & I2 & I3 --> H1 & H2 & H3 & H4
    H1 & H2 & H3 & H4 --> H5 & H6
    H5 & H6 --> O1 & O2

Implementation từ đầu

Python
1import numpy as np
2
3class NeuralNetwork:
4 def __init__(self, layer_sizes):
5 """
6 layer_sizes: list of layer sizes [input, hidden1, hidden2, ..., output]
7 Example: [784, 128, 64, 10] for MNIST
8 """
9 self.layers = len(layer_sizes)
10 self.weights = []
11 self.biases = []
12
13 # Initialize weights với Xavier initialization
14 for i in range(self.layers - 1):
15 w = np.random.randn(layer_sizes[i], layer_sizes[i+1]) * np.sqrt(2 / layer_sizes[i])
16 b = np.zeros((1, layer_sizes[i+1]))
17 self.weights.append(w)
18 self.biases.append(b)
19
20 def forward(self, X):
21 """Forward propagation"""
22 self.activations = [X]
23 self.z_values = []
24
25 A = X
26 for i in range(self.layers - 1):
27 Z = np.dot(A, self.weights[i]) + self.biases[i]
28 self.z_values.append(Z)
29
30 # ReLU for hidden layers, Softmax for output
31 if i < self.layers - 2:
32 A = np.maximum(0, Z) # ReLU
33 else:
34 A = self.softmax(Z) # Softmax
35
36 self.activations.append(A)
37
38 return A
39
40 def softmax(self, z):
41 exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
42 return exp_z / np.sum(exp_z, axis=1, keepdims=True)
43
44 def backward(self, X, y, learning_rate=0.01):
45 """Backpropagation"""
46 m = X.shape[0]
47
48 # Output layer gradient
49 dA = self.activations[-1] - y # Cross-entropy derivative
50
51 for i in range(self.layers - 2, -1, -1):
52 # Gradient for weights and biases
53 dW = np.dot(self.activations[i].T, dA) / m
54 db = np.sum(dA, axis=0, keepdims=True) / m
55
56 if i > 0:
57 # Gradient for previous layer
58 dA = np.dot(dA, self.weights[i].T)
59 dA *= (self.z_values[i-1] > 0) # ReLU derivative
60
61 # Update weights
62 self.weights[i] -= learning_rate * dW
63 self.biases[i] -= learning_rate * db
64
65 def train(self, X, y, epochs=100, learning_rate=0.01, batch_size=32):
66 """Training loop"""
67 for epoch in range(epochs):
68 # Mini-batch training
69 indices = np.random.permutation(X.shape[0])
70
71 for i in range(0, X.shape[0], batch_size):
72 batch_idx = indices[i:i+batch_size]
73 X_batch = X[batch_idx]
74 y_batch = y[batch_idx]
75
76 # Forward + Backward
77 self.forward(X_batch)
78 self.backward(X_batch, y_batch, learning_rate)
79
80 # Log progress
81 if epoch % 10 == 0:
82 loss = self.cross_entropy_loss(X, y)
83 acc = self.accuracy(X, y)
84 print(f"Epoch {epoch}: Loss={loss:.4f}, Accuracy={acc:.2%}")
85
86 def cross_entropy_loss(self, X, y):
87 pred = self.forward(X)
88 return -np.mean(np.sum(y * np.log(pred + 1e-8), axis=1))
89
90 def accuracy(self, X, y):
91 pred = self.forward(X)
92 return np.mean(np.argmax(pred, axis=1) == np.argmax(y, axis=1))
93
94 def predict(self, X):
95 return np.argmax(self.forward(X), axis=1)

Sử dụng với MNIST

Python
1from sklearn.datasets import fetch_openml
2from sklearn.model_selection import train_test_split
3from sklearn.preprocessing import OneHotEncoder
4
5# Load MNIST
6mnist = fetch_openml('mnist_784', version=1)
7X = mnist.data.values / 255.0 # Normalize
8y = mnist.target.values.astype(int)
9
10# One-hot encode
11y_onehot = np.eye(10)[y]
12
13# Split
14X_train, X_test, y_train, y_test = train_test_split(
15 X, y_onehot, test_size=0.2, random_state=42
16)
17
18# Create and train network
19nn = NeuralNetwork([784, 128, 64, 10])
20nn.train(X_train, y_train, epochs=50, learning_rate=0.1, batch_size=64)
21
22# Evaluate
23print(f"Test Accuracy: {nn.accuracy(X_test, y_test):.2%}")

Loss Functions

1. Mean Squared Error (Regression)

MSE=1ni=1n(yiy^i)2MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

Python
1def mse_loss(y_true, y_pred):
2 return np.mean((y_true - y_pred) ** 2)

2. Binary Cross-Entropy (Binary Classification)

BCE=1ni=1n[yilog(y^i)+(1yi)log(1y^i)]BCE = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i)]

Python
1def binary_cross_entropy(y_true, y_pred):
2 return -np.mean(y_true * np.log(y_pred + 1e-8) +
3 (1 - y_true) * np.log(1 - y_pred + 1e-8))

3. Categorical Cross-Entropy (Multi-class)

CCE=iyilog(y^i)CCE = -\sum_{i} y_i \log(\hat{y}_i)

Python
1def categorical_cross_entropy(y_true, y_pred):
2 return -np.mean(np.sum(y_true * np.log(y_pred + 1e-8), axis=1))

Regularization Techniques

1. L2 Regularization (Weight Decay)

Losstotal=Lossoriginal+λw2Loss_{total} = Loss_{original} + \lambda \sum w^2

Python
1def l2_regularization(weights, lambda_=0.01):
2 return lambda_ * sum(np.sum(w**2) for w in weights)

2. Dropout

Randomly "drop" neurons during training:

Python
1def dropout(X, dropout_rate=0.5, training=True):
2 if not training:
3 return X
4 mask = np.random.binomial(1, 1-dropout_rate, X.shape)
5 return X * mask / (1 - dropout_rate) # Scale up

3. Batch Normalization

Normalize activations trong mỗi mini-batch:

Python
1def batch_norm(X, gamma=1, beta=0, epsilon=1e-8):
2 mean = np.mean(X, axis=0)
3 var = np.var(X, axis=0)
4 X_norm = (X - mean) / np.sqrt(var + epsilon)
5 return gamma * X_norm + beta

Bài tập thực hành

Hands-on Exercise

Build Neural Network từ đầu:

  1. Implement class NeuralNetwork ở trên
  2. Train trên MNIST dataset
  3. Thử thay đổi:
    • Số layers và neurons
    • Learning rate
    • Activation functions
  4. So sánh kết quả và vẽ learning curves

Target: Đạt > 95% accuracy trên MNIST

Tiếp theo

Trong bài tiếp theo, chúng ta sẽ học về Convolutional Neural Networks (CNN) - kiến trúc mạnh mẽ cho computer vision.


Tài liệu tham khảo