MinAI - Về trang chủ
Hướng dẫn
4/1335 phút
Đang tải...

Advanced Ensemble Methods

Stacking, Blending, Weighted Ensembles, và Bayesian Model Averaging

Advanced Ensemble Methods

Ensemble methods kết hợp nhiều models để tạo ra model mạnh hơn. Bài này đi sâu vào các kỹ thuật nâng cao vượt qua Random Forest và XGBoost cơ bản.

🎯 Mục tiêu

  • Hiểu và implement Stacking
  • Blending và khi nào dùng
  • Weighted Ensembles tối ưu
  • Bayesian Model Averaging

1. Nhắc Lại Ensemble Basics

1.1 Ensemble Taxonomy

MethodIdeaExample
BaggingTrain nhiều models trên random subsetsRandom Forest
BoostingTrain sequentially, fix errorsXGBoost, LightGBM
StackingTrain meta-model trên predictionsStackingClassifier
BlendingStacking simplified (holdout set)Manual blending

1.2 Tại sao Ensemble hoạt động?

Ví dụ
1Diversity + Accuracy = Better Ensemble
2
3Model A: 80% accuracy (good at pattern X)
4Model B: 78% accuracy (good at pattern Y)
5Model C: 79% accuracy (good at pattern Z)
6
7Ensemble: 85% accuracy (good at X + Y + Z)

Điều kiện: Models phải diverse (khác nhau). Ba models giống hệt = vô nghĩa.


2. Stacking (Stacked Generalization)

2.1 Concept

Stacking Architecture

📊Input Data
🌲Random Forest → pred_rf
🚀XGBoost → pred_xgb
📐SVM → pred_svm
🧠Neural Net → pred_nn
🎯Logistic Regression (Meta-Model)
Final Prediction

2.2 Implementation với Scikit-learn

Python
1from sklearn.ensemble import (
2 StackingClassifier,
3 RandomForestClassifier,
4 GradientBoostingClassifier
5)
6from sklearn.linear_model import LogisticRegression
7from sklearn.svm import SVC
8from sklearn.model_selection import cross_val_score
9
10# Base models
11base_models = [
12 ('rf', RandomForestClassifier(n_estimators=200, random_state=42)),
13 ('gb', GradientBoostingClassifier(n_estimators=200, random_state=42)),
14 ('svc', SVC(probability=True, random_state=42))
15]
16
17# Meta-model
18meta_model = LogisticRegression()
19
20# Stacking
21stacking = StackingClassifier(
22 estimators=base_models,
23 final_estimator=meta_model,
24 cv=5, # 5-fold CV cho base predictions
25 stack_method='predict_proba', # Dùng probabilities
26 n_jobs=-1
27)
28
29stacking.fit(X_train, y_train)
30score = stacking.score(X_test, y_test)
31print(f"Stacking accuracy: {score:.4f}")

2.3 Multi-Layer Stacking

Python
1from sklearn.neural_network import MLPClassifier
2from lightgbm import LGBMClassifier
3
4# Layer 0
5layer_0 = [
6 ('rf', RandomForestClassifier(n_estimators=300)),
7 ('lgbm', LGBMClassifier(n_estimators=300)),
8 ('svc', SVC(probability=True))
9]
10
11# Layer 1 (stacking of layer 0)
12layer_1 = StackingClassifier(
13 estimators=layer_0,
14 final_estimator=LogisticRegression(),
15 cv=5
16)
17
18# Layer 2 (another base + layer_1 stacking)
19final_stack = StackingClassifier(
20 estimators=[
21 ('layer1', layer_1),
22 ('mlp', MLPClassifier(hidden_layer_sizes=(100,50)))
23 ],
24 final_estimator=LogisticRegression(),
25 cv=3
26)

3. Blending

3.1 Blending vs Stacking

StackingBlending
ValidationK-fold CVSingle holdout set
Data leak riskLower (CV)Higher (fixed split)
SpeedSlower (K fits)Faster (1 fit)
ComplexityHigherSimpler

3.2 Manual Blending

Python
1from sklearn.model_selection import train_test_split
2
3# Split: Train / Blend / Test
4X_train_base, X_blend, y_train_base, y_blend = train_test_split(
5 X_train, y_train, test_size=0.2, random_state=42
6)
7
8# Train base models on train set
9rf = RandomForestClassifier(n_estimators=200).fit(X_train_base, y_train_base)
10lgbm = LGBMClassifier(n_estimators=200).fit(X_train_base, y_train_base)
11svc = SVC(probability=True).fit(X_train_base, y_train_base)
12
13# Generate blend features (predictions on blend set)
14import numpy as np
15
16blend_features = np.column_stack([
17 rf.predict_proba(X_blend)[:, 1],
18 lgbm.predict_proba(X_blend)[:, 1],
19 svc.predict_proba(X_blend)[:, 1]
20])
21
22# Train meta-model on blend features
23meta = LogisticRegression()
24meta.fit(blend_features, y_blend)
25
26# Predict on test set
27test_features = np.column_stack([
28 rf.predict_proba(X_test)[:, 1],
29 lgbm.predict_proba(X_test)[:, 1],
30 svc.predict_proba(X_test)[:, 1]
31])
32final_pred = meta.predict(test_features)

4. Weighted Ensembles

4.1 Simple Averaging

Python
1# Equal weight averaging
2pred_rf = rf.predict_proba(X_test)[:, 1]
3pred_lgbm = lgbm.predict_proba(X_test)[:, 1]
4pred_svc = svc.predict_proba(X_test)[:, 1]
5
6avg_pred = (pred_rf + pred_lgbm + pred_svc) / 3

4.2 Optimized Weights

Python
1from scipy.optimize import minimize
2
3def objective(weights):
4 """Minimize log loss with weighted average."""
5 from sklearn.metrics import log_loss
6 w = weights / weights.sum() # Normalize
7 pred = w[0]*pred_rf_val + w[1]*pred_lgbm_val + w[2]*pred_svc_val
8 return log_loss(y_val, pred)
9
10# Optimize
11result = minimize(
12 objective,
13 x0=[1/3, 1/3, 1/3], # Initial equal weights
14 method='Nelder-Mead',
15 bounds=[(0, 1)] * 3
16)
17
18optimal_weights = result.x / result.x.sum()
19print(f"Optimal weights: RF={optimal_weights[0]:.3f}, "
20 f"LGBM={optimal_weights[1]:.3f}, SVC={optimal_weights[2]:.3f}")

4.3 Performance-Based Weights

Python
1from sklearn.metrics import accuracy_score
2
3# Weight by validation accuracy
4scores = {
5 'rf': accuracy_score(y_val, rf.predict(X_val)),
6 'lgbm': accuracy_score(y_val, lgbm.predict(X_val)),
7 'svc': accuracy_score(y_val, svc.predict(X_val))
8}
9
10total = sum(scores.values())
11weights = {k: v/total for k, v in scores.items()}
12print(f"Weights: {weights}")

5. Bayesian Model Averaging

5.1 Concept

Thay vì chọn 1 model tốt nhất, BMA tính weighted average theo model probability:

Ví dụ
1P(prediction|data) = sum over models: P(prediction|model) * P(model|data)

5.2 Simple BMA Implementation

Python
1import numpy as np
2
3def bayesian_model_average(models, X_val, y_val, X_test):
4 """BMA using BIC approximation for model weights."""
5 from sklearn.metrics import log_loss
6
7 bic_scores = []
8 predictions = []
9
10 for model in models:
11 pred = model.predict_proba(X_val)
12 n = len(y_val)
13 k = getattr(model, 'n_features_in_', 10) # Approximate
14 ll = -log_loss(y_val, pred) * n # Log-likelihood approx
15 bic = -2 * ll + k * np.log(n)
16 bic_scores.append(bic)
17 predictions.append(model.predict_proba(X_test))
18
19 # Convert BIC to weights
20 bic_arr = np.array(bic_scores)
21 delta_bic = bic_arr - bic_arr.min()
22 weights = np.exp(-0.5 * delta_bic)
23 weights /= weights.sum()
24
25 # Weighted average
26 final_pred = sum(w * p for w, p in zip(weights, predictions))
27 return final_pred, weights
28
29# Usage
30bma_pred, bma_weights = bayesian_model_average(
31 [rf, lgbm, svc], X_val, y_val, X_test
32)

6. Best Practices

6.1 Khi nào dùng Ensemble

ScenarioRecommendation
Kaggle competitionStacking + weighted ensemble
Production (speed critical)Single model (XGBoost/LightGBM)
Production (accuracy critical)Simple ensemble (2-3 models)
Small dataBagging (Random Forest)
Large dataBoosting (LightGBM)

6.2 Diversity Matters

Python
1# Check correlation giữa model predictions
2import pandas as pd
3
4pred_df = pd.DataFrame({
5 'RF': pred_rf, 'LGBM': pred_lgbm, 'SVC': pred_svc
6})
7print(pred_df.corr())
8# Low correlation = high diversity = better ensemble

📝 Quiz

  1. Stacking khác Blending ở điểm nào chính?

    • Stacking dùng K-fold CV, Blending dùng single holdout
    • Stacking nhanh hơn
    • Blending chính xác hơn luôn
    • Không khác nhau
  2. Base models trong ensemble cần?

    • Cùng loại algorithm
    • Diverse (khác nhau về predictions)
    • Cùng hyperparameters
    • Train trên cùng data
  3. BMA tính weights dựa trên?

    • Random
    • Training accuracy
    • Model probability (BIC/evidence)
    • Model complexity

🎯 Key Takeaways

  1. Stacking — Meta-model learns từ base model predictions (powerful)
  2. Blending — Simpler stacking with holdout set
  3. Weighted Ensemble — Optimize weights by metric
  4. BMA — Bayesian approach, weights by model evidence
  5. Diversity — Key to ensemble success, check prediction correlation

🚀 Bài tiếp theo

Transfer Learning & Fine-tuning — Tận dụng pre-trained models cho domain mới!