🔗 Neural Networks từ A-Z
Trong bài này, chúng ta sẽ build Neural Network từ đầu để hiểu rõ cách chúng hoạt động.
Neuron - Đơn vị cơ bản
Biological vs Artificial Neuron
graph LR
subgraph "Biological Neuron"
D1[Dendrites] --> S1[Soma]
S1 --> A1[Axon]
end
subgraph "Artificial Neuron"
X1[x₁] --> |w₁| N[Σ + f]
X2[x₂] --> |w₂| N
X3[x₃] --> |w₃| N
B[bias] --> N
N --> Y[output]
endPerceptron
Perceptron là neural network đơn giản nhất:
Trong đó:
- : inputs
- : weights
- : bias
- : activation function
1import numpy as np23class Perceptron:4 def __init__(self, n_inputs):5 # Initialize random weights6 self.weights = np.random.randn(n_inputs)7 self.bias = np.random.randn()8 9 def forward(self, x):10 # Weighted sum + bias11 z = np.dot(x, self.weights) + self.bias12 # Step activation13 return 1 if z > 0 else 014 15 def train(self, X, y, learning_rate=0.1, epochs=100):16 for _ in range(epochs):17 for xi, yi in zip(X, y):18 pred = self.forward(xi)19 error = yi - pred20 # Update weights21 self.weights += learning_rate * error * xi22 self.bias += learning_rate * errorActivation Functions
Activation functions thêm non-linearity vào network:
1. Sigmoid
1def sigmoid(z):2 return 1 / (1 + np.exp(-z))34def sigmoid_derivative(z):5 s = sigmoid(z)6 return s * (1 - s)Pros: Output trong [0, 1], smooth gradient Cons: Vanishing gradient, not zero-centered
2. ReLU (Rectified Linear Unit)
1def relu(z):2 return np.maximum(0, z)34def relu_derivative(z):5 return (z > 0).astype(float)Pros: Fast, no vanishing gradient Cons: Dead neurons (negative values → 0)
3. Tanh
1def tanh(z):2 return np.tanh(z)34def tanh_derivative(z):5 return 1 - np.tanh(z)**2Pros: Zero-centered, stronger gradients Cons: Still has vanishing gradient
4. Softmax (for classification)
1def softmax(z):2 exp_z = np.exp(z - np.max(z)) # Numerical stability3 return exp_z / exp_z.sum()Multi-Layer Perceptron (MLP)
Architecture
graph LR
subgraph Input
I1((x₁))
I2((x₂))
I3((x₃))
end
subgraph Hidden1
H1((h₁))
H2((h₂))
H3((h₃))
H4((h₄))
end
subgraph Hidden2
H5((h₅))
H6((h₆))
end
subgraph Output
O1((y₁))
O2((y₂))
end
I1 & I2 & I3 --> H1 & H2 & H3 & H4
H1 & H2 & H3 & H4 --> H5 & H6
H5 & H6 --> O1 & O2Implementation từ đầu
1import numpy as np23class NeuralNetwork:4 def __init__(self, layer_sizes):5 """6 layer_sizes: list of layer sizes [input, hidden1, hidden2, ..., output]7 Example: [784, 128, 64, 10] for MNIST8 """9 self.layers = len(layer_sizes)10 self.weights = []11 self.biases = []12 13 # Initialize weights với Xavier initialization14 for i in range(self.layers - 1):15 w = np.random.randn(layer_sizes[i], layer_sizes[i+1]) * np.sqrt(2 / layer_sizes[i])16 b = np.zeros((1, layer_sizes[i+1]))17 self.weights.append(w)18 self.biases.append(b)19 20 def forward(self, X):21 """Forward propagation"""22 self.activations = [X]23 self.z_values = []24 25 A = X26 for i in range(self.layers - 1):27 Z = np.dot(A, self.weights[i]) + self.biases[i]28 self.z_values.append(Z)29 30 # ReLU for hidden layers, Softmax for output31 if i < self.layers - 2:32 A = np.maximum(0, Z) # ReLU33 else:34 A = self.softmax(Z) # Softmax35 36 self.activations.append(A)37 38 return A39 40 def softmax(self, z):41 exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))42 return exp_z / np.sum(exp_z, axis=1, keepdims=True)43 44 def backward(self, X, y, learning_rate=0.01):45 """Backpropagation"""46 m = X.shape[0]47 48 # Output layer gradient49 dA = self.activations[-1] - y # Cross-entropy derivative50 51 for i in range(self.layers - 2, -1, -1):52 # Gradient for weights and biases53 dW = np.dot(self.activations[i].T, dA) / m54 db = np.sum(dA, axis=0, keepdims=True) / m55 56 if i > 0:57 # Gradient for previous layer58 dA = np.dot(dA, self.weights[i].T)59 dA *= (self.z_values[i-1] > 0) # ReLU derivative60 61 # Update weights62 self.weights[i] -= learning_rate * dW63 self.biases[i] -= learning_rate * db64 65 def train(self, X, y, epochs=100, learning_rate=0.01, batch_size=32):66 """Training loop"""67 for epoch in range(epochs):68 # Mini-batch training69 indices = np.random.permutation(X.shape[0])70 71 for i in range(0, X.shape[0], batch_size):72 batch_idx = indices[i:i+batch_size]73 X_batch = X[batch_idx]74 y_batch = y[batch_idx]75 76 # Forward + Backward77 self.forward(X_batch)78 self.backward(X_batch, y_batch, learning_rate)79 80 # Log progress81 if epoch % 10 == 0:82 loss = self.cross_entropy_loss(X, y)83 acc = self.accuracy(X, y)84 print(f"Epoch {epoch}: Loss={loss:.4f}, Accuracy={acc:.2%}")85 86 def cross_entropy_loss(self, X, y):87 pred = self.forward(X)88 return -np.mean(np.sum(y * np.log(pred + 1e-8), axis=1))89 90 def accuracy(self, X, y):91 pred = self.forward(X)92 return np.mean(np.argmax(pred, axis=1) == np.argmax(y, axis=1))93 94 def predict(self, X):95 return np.argmax(self.forward(X), axis=1)Sử dụng với MNIST
1from sklearn.datasets import fetch_openml2from sklearn.model_selection import train_test_split3from sklearn.preprocessing import OneHotEncoder45# Load MNIST6mnist = fetch_openml('mnist_784', version=1)7X = mnist.data.values / 255.0 # Normalize8y = mnist.target.values.astype(int)910# One-hot encode11y_onehot = np.eye(10)[y]1213# Split14X_train, X_test, y_train, y_test = train_test_split(15 X, y_onehot, test_size=0.2, random_state=4216)1718# Create and train network19nn = NeuralNetwork([784, 128, 64, 10])20nn.train(X_train, y_train, epochs=50, learning_rate=0.1, batch_size=64)2122# Evaluate23print(f"Test Accuracy: {nn.accuracy(X_test, y_test):.2%}")Loss Functions
1. Mean Squared Error (Regression)
1def mse_loss(y_true, y_pred):2 return np.mean((y_true - y_pred) ** 2)2. Binary Cross-Entropy (Binary Classification)
1def binary_cross_entropy(y_true, y_pred):2 return -np.mean(y_true * np.log(y_pred + 1e-8) + 3 (1 - y_true) * np.log(1 - y_pred + 1e-8))3. Categorical Cross-Entropy (Multi-class)
1def categorical_cross_entropy(y_true, y_pred):2 return -np.mean(np.sum(y_true * np.log(y_pred + 1e-8), axis=1))Regularization Techniques
1. L2 Regularization (Weight Decay)
1def l2_regularization(weights, lambda_=0.01):2 return lambda_ * sum(np.sum(w**2) for w in weights)2. Dropout
Randomly "drop" neurons during training:
1def dropout(X, dropout_rate=0.5, training=True):2 if not training:3 return X4 mask = np.random.binomial(1, 1-dropout_rate, X.shape)5 return X * mask / (1 - dropout_rate) # Scale up3. Batch Normalization
Normalize activations trong mỗi mini-batch:
1def batch_norm(X, gamma=1, beta=0, epsilon=1e-8):2 mean = np.mean(X, axis=0)3 var = np.var(X, axis=0)4 X_norm = (X - mean) / np.sqrt(var + epsilon)5 return gamma * X_norm + betaBài tập thực hành
Build Neural Network từ đầu:
- Implement class
NeuralNetworkở trên - Train trên MNIST dataset
- Thử thay đổi:
- Số layers và neurons
- Learning rate
- Activation functions
- So sánh kết quả và vẽ learning curves
Target: Đạt > 95% accuracy trên MNIST
Tiếp theo
Trong bài tiếp theo, chúng ta sẽ học về Convolutional Neural Networks (CNN) - kiến trúc mạnh mẽ cho computer vision.
