⚙️ Hyperparameter Tuning
Hyperparameter tuning là quá trình tìm kiếm bộ hyperparameters tối ưu cho model. Bài này cover từ Grid Search đến Bayesian Optimization.
Hyperparameters vs Parameters
Phân biệt
- Parameters: Học từ data (weights, biases)
- Hyperparameters: Set trước khi train
- Learning rate
- Number of layers
- Regularization strength
- Batch size
Common Hyperparameters
1. Model Architecture
| Model | Hyperparameters |
|---|---|
| Decision Tree | max_depth, min_samples_split, criterion |
| Random Forest | n_estimators, max_depth, max_features |
| SVM | C, kernel, gamma |
| Neural Network | layers, neurons, activation |
| XGBoost | n_estimators, max_depth, learning_rate |
2. Training Process
- Learning rate: Step size for gradient descent
- Batch size: Samples per gradient update
- Epochs: Number of training iterations
- Early stopping patience: When to stop
Search Strategies
1. Grid Search
Tìm kiếm exhaustive trên tất cả combinations:
Python
1from sklearn.model_selection import GridSearchCV2from sklearn.ensemble import RandomForestClassifier34# Define parameter grid5param_grid = {6 'n_estimators': [100, 200, 300],7 'max_depth': [5, 10, 15, None],8 'min_samples_split': [2, 5, 10],9 'max_features': ['sqrt', 'log2']10}1112# Grid Search13rf = RandomForestClassifier(random_state=42)14grid_search = GridSearchCV(15 rf,16 param_grid,17 cv=5,18 scoring='accuracy',19 n_jobs=-1,20 verbose=221)2223grid_search.fit(X_train, y_train)2425# Best parameters26print(f"Best params: {grid_search.best_params_}")27print(f"Best score: {grid_search.best_score_:.4f}")Pros: Simple, thorough Cons: Exponential complexity, slow
2. Random Search
Sample random combinations:
Python
1from sklearn.model_selection import RandomizedSearchCV2from scipy.stats import randint, uniform34# Define distributions5param_distributions = {6 'n_estimators': randint(100, 500),7 'max_depth': randint(3, 20),8 'min_samples_split': randint(2, 20),9 'learning_rate': uniform(0.01, 0.3),10 'subsample': uniform(0.6, 0.4)11}1213# Random Search14from xgboost import XGBClassifier15xgb = XGBClassifier(random_state=42)1617random_search = RandomizedSearchCV(18 xgb,19 param_distributions,20 n_iter=100, # Number of random samples21 cv=5,22 scoring='accuracy',23 n_jobs=-1,24 random_state=4225)2627random_search.fit(X_train, y_train)28print(f"Best params: {random_search.best_params_}")Pros: More efficient than grid, better coverage Cons: May miss optimal values
3. Bayesian Optimization
Sử dụng probabilistic model để guide search:
Python
1from skopt import BayesSearchCV2from skopt.space import Real, Integer, Categorical34# Define search space5search_spaces = {6 'n_estimators': Integer(100, 500),7 'max_depth': Integer(3, 20),8 'learning_rate': Real(0.01, 0.3, prior='log-uniform'),9 'subsample': Real(0.6, 1.0),10 'colsample_bytree': Real(0.6, 1.0)11}1213# Bayesian Optimization14bayes_search = BayesSearchCV(15 XGBClassifier(random_state=42),16 search_spaces,17 n_iter=50,18 cv=5,19 scoring='accuracy',20 n_jobs=-1,21 random_state=4222)2324bayes_search.fit(X_train, y_train)25print(f"Best params: {bayes_search.best_params_}")4. Optuna (Modern Approach)
Framework mạnh mẽ cho hyperparameter optimization:
Python
1import optuna2from sklearn.model_selection import cross_val_score34def objective(trial):5 # Suggest hyperparameters6 params = {7 'n_estimators': trial.suggest_int('n_estimators', 100, 500),8 'max_depth': trial.suggest_int('max_depth', 3, 20),9 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),10 'subsample': trial.suggest_float('subsample', 0.6, 1.0),11 'colsample_bytree': trial.suggest_float('colsample_bytree', 0.6, 1.0),12 'min_child_weight': trial.suggest_int('min_child_weight', 1, 10),13 'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10.0, log=True),14 'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10.0, log=True)15 }16 17 model = XGBClassifier(**params, random_state=42)18 scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')19 20 return scores.mean()2122# Create study23study = optuna.create_study(direction='maximize')24study.optimize(objective, n_trials=100, show_progress_bar=True)2526# Best parameters27print(f"Best params: {study.best_params}")28print(f"Best score: {study.best_value:.4f}")2930# Visualize31optuna.visualization.plot_optimization_history(study)32optuna.visualization.plot_param_importances(study)Neural Network Hyperparameter Tuning
Với Keras Tuner
Python
1import keras_tuner as kt2from tensorflow import keras34def build_model(hp):5 model = keras.Sequential()6 7 # Tune number of layers8 for i in range(hp.Int('num_layers', 1, 4)):9 model.add(keras.layers.Dense(10 units=hp.Int(f'units_{i}', min_value=32, max_value=512, step=32),11 activation=hp.Choice('activation', ['relu', 'tanh', 'selu'])12 ))13 14 # Tune dropout15 model.add(keras.layers.Dropout(16 hp.Float('dropout', 0.0, 0.5, step=0.1)17 ))18 19 model.add(keras.layers.Dense(1, activation='sigmoid'))20 21 # Tune learning rate22 model.compile(23 optimizer=keras.optimizers.Adam(24 hp.Float('learning_rate', 1e-4, 1e-2, sampling='log')25 ),26 loss='binary_crossentropy',27 metrics=['accuracy']28 )29 30 return model3132# Create tuner33tuner = kt.Hyperband(34 build_model,35 objective='val_accuracy',36 max_epochs=50,37 factor=3,38 directory='tuning',39 project_name='my_model'40)4142# Search43tuner.search(44 X_train, y_train,45 epochs=50,46 validation_data=(X_val, y_val),47 callbacks=[keras.callbacks.EarlyStopping(patience=5)]48)4950# Best model51best_model = tuner.get_best_models(1)[0]52best_hp = tuner.get_best_hyperparameters(1)[0]Cross-Validation Strategies
K-Fold
Python
1from sklearn.model_selection import KFold, StratifiedKFold23# Standard K-Fold4kfold = KFold(n_splits=5, shuffle=True, random_state=42)56# Stratified K-Fold (for classification)7stratified = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)89# Time Series Split10from sklearn.model_selection import TimeSeriesSplit11tscv = TimeSeriesSplit(n_splits=5)Nested Cross-Validation
Đánh giá unbiased performance:
Python
1from sklearn.model_selection import cross_val_score23# Outer CV for model evaluation4outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)56# Inner CV for hyperparameter tuning7inner_cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)89# Grid search with inner CV10grid_search = GridSearchCV(11 XGBClassifier(),12 param_grid,13 cv=inner_cv,14 scoring='accuracy'15)1617# Nested CV scores18nested_scores = cross_val_score(19 grid_search, X, y, 20 cv=outer_cv, 21 scoring='accuracy'22)2324print(f"Nested CV Score: {nested_scores.mean():.4f} (+/- {nested_scores.std():.4f})")Best Practices
Tips for Effective Tuning
- Start with Random Search trước Grid Search
- Use log scale cho learning rate, regularization
- Early stopping để tiết kiệm thời gian
- Nested CV cho unbiased evaluation
- Monitor for overfitting khi tune
- Document experiments để reproducibility
Practical Example
Complete Tuning Pipeline
Python
1import numpy as np2import pandas as pd3from sklearn.model_selection import train_test_split4from sklearn.preprocessing import StandardScaler5import optuna6from xgboost import XGBClassifier78# Load data9data = pd.read_csv('data.csv')10X = data.drop('target', axis=1)11y = data['target']1213# Split14X_train, X_test, y_train, y_test = train_test_split(15 X, y, test_size=0.2, random_state=42, stratify=y16)1718# Scale19scaler = StandardScaler()20X_train_scaled = scaler.fit_transform(X_train)21X_test_scaled = scaler.transform(X_test)2223# Optuna objective24def objective(trial):25 params = {26 'n_estimators': trial.suggest_int('n_estimators', 100, 500),27 'max_depth': trial.suggest_int('max_depth', 3, 15),28 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),29 'subsample': trial.suggest_float('subsample', 0.6, 1.0),30 'colsample_bytree': trial.suggest_float('colsample_bytree', 0.6, 1.0),31 'gamma': trial.suggest_float('gamma', 0, 5),32 'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10, log=True),33 'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10, log=True)34 }35 36 model = XGBClassifier(**params, random_state=42, use_label_encoder=False, eval_metric='logloss')37 38 # Use cross-validation39 from sklearn.model_selection import cross_val_score40 scores = cross_val_score(model, X_train_scaled, y_train, cv=5, scoring='accuracy')41 42 return scores.mean()4344# Run optimization45study = optuna.create_study(direction='maximize')46study.optimize(objective, n_trials=100, show_progress_bar=True)4748# Train final model with best params49best_model = XGBClassifier(**study.best_params, random_state=42)50best_model.fit(X_train_scaled, y_train)5152# Evaluate53from sklearn.metrics import accuracy_score, classification_report54y_pred = best_model.predict(X_test_scaled)55print(f"Test Accuracy: {accuracy_score(y_test, y_pred):.4f}")56print(classification_report(y_test, y_pred))Bài tập thực hành
Hands-on Exercise
Hyperparameter Tuning Challenge:
- Load Titanic dataset
- Implement tuning với:
- Grid Search cho Random Forest
- Random Search cho XGBoost
- Optuna cho LightGBM
- So sánh kết quả và time
- Use nested CV cho final evaluation
Target: Tìm best hyperparameters và so sánh efficiency
