Lý thuyết
30 phút
Bài 2/4

Hyperparameter Tuning

Tối ưu hyperparameters cho ML models

⚙️ Hyperparameter Tuning

Hyperparameter tuning là quá trình tìm kiếm bộ hyperparameters tối ưu cho model. Bài này cover từ Grid Search đến Bayesian Optimization.

Hyperparameters vs Parameters

Phân biệt
  • Parameters: Học từ data (weights, biases)
  • Hyperparameters: Set trước khi train
    • Learning rate
    • Number of layers
    • Regularization strength
    • Batch size

Common Hyperparameters

1. Model Architecture

ModelHyperparameters
Decision Treemax_depth, min_samples_split, criterion
Random Forestn_estimators, max_depth, max_features
SVMC, kernel, gamma
Neural Networklayers, neurons, activation
XGBoostn_estimators, max_depth, learning_rate

2. Training Process

  • Learning rate: Step size for gradient descent
  • Batch size: Samples per gradient update
  • Epochs: Number of training iterations
  • Early stopping patience: When to stop

Search Strategies

1. Grid Search

Tìm kiếm exhaustive trên tất cả combinations:

Python
1from sklearn.model_selection import GridSearchCV
2from sklearn.ensemble import RandomForestClassifier
3
4# Define parameter grid
5param_grid = {
6 'n_estimators': [100, 200, 300],
7 'max_depth': [5, 10, 15, None],
8 'min_samples_split': [2, 5, 10],
9 'max_features': ['sqrt', 'log2']
10}
11
12# Grid Search
13rf = RandomForestClassifier(random_state=42)
14grid_search = GridSearchCV(
15 rf,
16 param_grid,
17 cv=5,
18 scoring='accuracy',
19 n_jobs=-1,
20 verbose=2
21)
22
23grid_search.fit(X_train, y_train)
24
25# Best parameters
26print(f"Best params: {grid_search.best_params_}")
27print(f"Best score: {grid_search.best_score_:.4f}")

Pros: Simple, thorough Cons: Exponential complexity, slow

2. Random Search

Sample random combinations:

Python
1from sklearn.model_selection import RandomizedSearchCV
2from scipy.stats import randint, uniform
3
4# Define distributions
5param_distributions = {
6 'n_estimators': randint(100, 500),
7 'max_depth': randint(3, 20),
8 'min_samples_split': randint(2, 20),
9 'learning_rate': uniform(0.01, 0.3),
10 'subsample': uniform(0.6, 0.4)
11}
12
13# Random Search
14from xgboost import XGBClassifier
15xgb = XGBClassifier(random_state=42)
16
17random_search = RandomizedSearchCV(
18 xgb,
19 param_distributions,
20 n_iter=100, # Number of random samples
21 cv=5,
22 scoring='accuracy',
23 n_jobs=-1,
24 random_state=42
25)
26
27random_search.fit(X_train, y_train)
28print(f"Best params: {random_search.best_params_}")

Pros: More efficient than grid, better coverage Cons: May miss optimal values

3. Bayesian Optimization

Sử dụng probabilistic model để guide search:

Python
1from skopt import BayesSearchCV
2from skopt.space import Real, Integer, Categorical
3
4# Define search space
5search_spaces = {
6 'n_estimators': Integer(100, 500),
7 'max_depth': Integer(3, 20),
8 'learning_rate': Real(0.01, 0.3, prior='log-uniform'),
9 'subsample': Real(0.6, 1.0),
10 'colsample_bytree': Real(0.6, 1.0)
11}
12
13# Bayesian Optimization
14bayes_search = BayesSearchCV(
15 XGBClassifier(random_state=42),
16 search_spaces,
17 n_iter=50,
18 cv=5,
19 scoring='accuracy',
20 n_jobs=-1,
21 random_state=42
22)
23
24bayes_search.fit(X_train, y_train)
25print(f"Best params: {bayes_search.best_params_}")

4. Optuna (Modern Approach)

Framework mạnh mẽ cho hyperparameter optimization:

Python
1import optuna
2from sklearn.model_selection import cross_val_score
3
4def objective(trial):
5 # Suggest hyperparameters
6 params = {
7 'n_estimators': trial.suggest_int('n_estimators', 100, 500),
8 'max_depth': trial.suggest_int('max_depth', 3, 20),
9 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
10 'subsample': trial.suggest_float('subsample', 0.6, 1.0),
11 'colsample_bytree': trial.suggest_float('colsample_bytree', 0.6, 1.0),
12 'min_child_weight': trial.suggest_int('min_child_weight', 1, 10),
13 'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10.0, log=True),
14 'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10.0, log=True)
15 }
16
17 model = XGBClassifier(**params, random_state=42)
18 scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
19
20 return scores.mean()
21
22# Create study
23study = optuna.create_study(direction='maximize')
24study.optimize(objective, n_trials=100, show_progress_bar=True)
25
26# Best parameters
27print(f"Best params: {study.best_params}")
28print(f"Best score: {study.best_value:.4f}")
29
30# Visualize
31optuna.visualization.plot_optimization_history(study)
32optuna.visualization.plot_param_importances(study)

Neural Network Hyperparameter Tuning

Với Keras Tuner

Python
1import keras_tuner as kt
2from tensorflow import keras
3
4def build_model(hp):
5 model = keras.Sequential()
6
7 # Tune number of layers
8 for i in range(hp.Int('num_layers', 1, 4)):
9 model.add(keras.layers.Dense(
10 units=hp.Int(f'units_{i}', min_value=32, max_value=512, step=32),
11 activation=hp.Choice('activation', ['relu', 'tanh', 'selu'])
12 ))
13
14 # Tune dropout
15 model.add(keras.layers.Dropout(
16 hp.Float('dropout', 0.0, 0.5, step=0.1)
17 ))
18
19 model.add(keras.layers.Dense(1, activation='sigmoid'))
20
21 # Tune learning rate
22 model.compile(
23 optimizer=keras.optimizers.Adam(
24 hp.Float('learning_rate', 1e-4, 1e-2, sampling='log')
25 ),
26 loss='binary_crossentropy',
27 metrics=['accuracy']
28 )
29
30 return model
31
32# Create tuner
33tuner = kt.Hyperband(
34 build_model,
35 objective='val_accuracy',
36 max_epochs=50,
37 factor=3,
38 directory='tuning',
39 project_name='my_model'
40)
41
42# Search
43tuner.search(
44 X_train, y_train,
45 epochs=50,
46 validation_data=(X_val, y_val),
47 callbacks=[keras.callbacks.EarlyStopping(patience=5)]
48)
49
50# Best model
51best_model = tuner.get_best_models(1)[0]
52best_hp = tuner.get_best_hyperparameters(1)[0]

Cross-Validation Strategies

K-Fold

Python
1from sklearn.model_selection import KFold, StratifiedKFold
2
3# Standard K-Fold
4kfold = KFold(n_splits=5, shuffle=True, random_state=42)
5
6# Stratified K-Fold (for classification)
7stratified = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
8
9# Time Series Split
10from sklearn.model_selection import TimeSeriesSplit
11tscv = TimeSeriesSplit(n_splits=5)

Nested Cross-Validation

Đánh giá unbiased performance:

Python
1from sklearn.model_selection import cross_val_score
2
3# Outer CV for model evaluation
4outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
5
6# Inner CV for hyperparameter tuning
7inner_cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)
8
9# Grid search with inner CV
10grid_search = GridSearchCV(
11 XGBClassifier(),
12 param_grid,
13 cv=inner_cv,
14 scoring='accuracy'
15)
16
17# Nested CV scores
18nested_scores = cross_val_score(
19 grid_search, X, y,
20 cv=outer_cv,
21 scoring='accuracy'
22)
23
24print(f"Nested CV Score: {nested_scores.mean():.4f} (+/- {nested_scores.std():.4f})")

Best Practices

Tips for Effective Tuning
  1. Start with Random Search trước Grid Search
  2. Use log scale cho learning rate, regularization
  3. Early stopping để tiết kiệm thời gian
  4. Nested CV cho unbiased evaluation
  5. Monitor for overfitting khi tune
  6. Document experiments để reproducibility

Practical Example

Complete Tuning Pipeline

Python
1import numpy as np
2import pandas as pd
3from sklearn.model_selection import train_test_split
4from sklearn.preprocessing import StandardScaler
5import optuna
6from xgboost import XGBClassifier
7
8# Load data
9data = pd.read_csv('data.csv')
10X = data.drop('target', axis=1)
11y = data['target']
12
13# Split
14X_train, X_test, y_train, y_test = train_test_split(
15 X, y, test_size=0.2, random_state=42, stratify=y
16)
17
18# Scale
19scaler = StandardScaler()
20X_train_scaled = scaler.fit_transform(X_train)
21X_test_scaled = scaler.transform(X_test)
22
23# Optuna objective
24def objective(trial):
25 params = {
26 'n_estimators': trial.suggest_int('n_estimators', 100, 500),
27 'max_depth': trial.suggest_int('max_depth', 3, 15),
28 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
29 'subsample': trial.suggest_float('subsample', 0.6, 1.0),
30 'colsample_bytree': trial.suggest_float('colsample_bytree', 0.6, 1.0),
31 'gamma': trial.suggest_float('gamma', 0, 5),
32 'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10, log=True),
33 'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10, log=True)
34 }
35
36 model = XGBClassifier(**params, random_state=42, use_label_encoder=False, eval_metric='logloss')
37
38 # Use cross-validation
39 from sklearn.model_selection import cross_val_score
40 scores = cross_val_score(model, X_train_scaled, y_train, cv=5, scoring='accuracy')
41
42 return scores.mean()
43
44# Run optimization
45study = optuna.create_study(direction='maximize')
46study.optimize(objective, n_trials=100, show_progress_bar=True)
47
48# Train final model with best params
49best_model = XGBClassifier(**study.best_params, random_state=42)
50best_model.fit(X_train_scaled, y_train)
51
52# Evaluate
53from sklearn.metrics import accuracy_score, classification_report
54y_pred = best_model.predict(X_test_scaled)
55print(f"Test Accuracy: {accuracy_score(y_test, y_pred):.4f}")
56print(classification_report(y_test, y_pred))

Bài tập thực hành

Hands-on Exercise

Hyperparameter Tuning Challenge:

  1. Load Titanic dataset
  2. Implement tuning với:
    • Grid Search cho Random Forest
    • Random Search cho XGBoost
    • Optuna cho LightGBM
  3. So sánh kết quả và time
  4. Use nested CV cho final evaluation

Target: Tìm best hyperparameters và so sánh efficiency


Tài liệu tham khảo