Advanced Ensemble Methods
Ensemble methods kết hợp nhiều models để tạo ra model mạnh hơn. Bài này đi sâu vào các kỹ thuật nâng cao vượt qua Random Forest và XGBoost cơ bản.
🎯 Mục tiêu
- Hiểu và implement Stacking
- Blending và khi nào dùng
- Weighted Ensembles tối ưu
- Bayesian Model Averaging
1. Nhắc Lại Ensemble Basics
1.1 Ensemble Taxonomy
| Method | Idea | Example |
|---|---|---|
| Bagging | Train nhiều models trên random subsets | Random Forest |
| Boosting | Train sequentially, fix errors | XGBoost, LightGBM |
| Stacking | Train meta-model trên predictions | StackingClassifier |
| Blending | Stacking simplified (holdout set) | Manual blending |
1.2 Tại sao Ensemble hoạt động?
Ví dụ
1Diversity + Accuracy = Better Ensemble2 3Model A: 80% accuracy (good at pattern X)4Model B: 78% accuracy (good at pattern Y)5Model C: 79% accuracy (good at pattern Z)6 7Ensemble: 85% accuracy (good at X + Y + Z)Điều kiện: Models phải diverse (khác nhau). Ba models giống hệt = vô nghĩa.
2. Stacking (Stacked Generalization)
2.1 Concept
Stacking Architecture
📊Input Data
🌲Random Forest → pred_rf
🚀XGBoost → pred_xgb
📐SVM → pred_svm
🧠Neural Net → pred_nn
🎯Logistic Regression (Meta-Model)
✅Final Prediction
2.2 Implementation với Scikit-learn
Python
1from sklearn.ensemble import (2 StackingClassifier, 3 RandomForestClassifier, 4 GradientBoostingClassifier5)6from sklearn.linear_model import LogisticRegression7from sklearn.svm import SVC8from sklearn.model_selection import cross_val_score910# Base models11base_models = [12 ('rf', RandomForestClassifier(n_estimators=200, random_state=42)),13 ('gb', GradientBoostingClassifier(n_estimators=200, random_state=42)),14 ('svc', SVC(probability=True, random_state=42))15]1617# Meta-model18meta_model = LogisticRegression()1920# Stacking21stacking = StackingClassifier(22 estimators=base_models,23 final_estimator=meta_model,24 cv=5, # 5-fold CV cho base predictions25 stack_method='predict_proba', # Dùng probabilities26 n_jobs=-127)2829stacking.fit(X_train, y_train)30score = stacking.score(X_test, y_test)31print(f"Stacking accuracy: {score:.4f}")2.3 Multi-Layer Stacking
Python
1from sklearn.neural_network import MLPClassifier2from lightgbm import LGBMClassifier34# Layer 05layer_0 = [6 ('rf', RandomForestClassifier(n_estimators=300)),7 ('lgbm', LGBMClassifier(n_estimators=300)),8 ('svc', SVC(probability=True))9]1011# Layer 1 (stacking of layer 0)12layer_1 = StackingClassifier(13 estimators=layer_0,14 final_estimator=LogisticRegression(),15 cv=516)1718# Layer 2 (another base + layer_1 stacking) 19final_stack = StackingClassifier(20 estimators=[21 ('layer1', layer_1),22 ('mlp', MLPClassifier(hidden_layer_sizes=(100,50)))23 ],24 final_estimator=LogisticRegression(),25 cv=326)3. Blending
3.1 Blending vs Stacking
| Stacking | Blending | |
|---|---|---|
| Validation | K-fold CV | Single holdout set |
| Data leak risk | Lower (CV) | Higher (fixed split) |
| Speed | Slower (K fits) | Faster (1 fit) |
| Complexity | Higher | Simpler |
3.2 Manual Blending
Python
1from sklearn.model_selection import train_test_split23# Split: Train / Blend / Test4X_train_base, X_blend, y_train_base, y_blend = train_test_split(5 X_train, y_train, test_size=0.2, random_state=426)78# Train base models on train set9rf = RandomForestClassifier(n_estimators=200).fit(X_train_base, y_train_base)10lgbm = LGBMClassifier(n_estimators=200).fit(X_train_base, y_train_base)11svc = SVC(probability=True).fit(X_train_base, y_train_base)1213# Generate blend features (predictions on blend set)14import numpy as np1516blend_features = np.column_stack([17 rf.predict_proba(X_blend)[:, 1],18 lgbm.predict_proba(X_blend)[:, 1],19 svc.predict_proba(X_blend)[:, 1]20])2122# Train meta-model on blend features23meta = LogisticRegression()24meta.fit(blend_features, y_blend)2526# Predict on test set27test_features = np.column_stack([28 rf.predict_proba(X_test)[:, 1],29 lgbm.predict_proba(X_test)[:, 1],30 svc.predict_proba(X_test)[:, 1]31])32final_pred = meta.predict(test_features)4. Weighted Ensembles
4.1 Simple Averaging
Python
1# Equal weight averaging2pred_rf = rf.predict_proba(X_test)[:, 1]3pred_lgbm = lgbm.predict_proba(X_test)[:, 1]4pred_svc = svc.predict_proba(X_test)[:, 1]56avg_pred = (pred_rf + pred_lgbm + pred_svc) / 34.2 Optimized Weights
Python
1from scipy.optimize import minimize23def objective(weights):4 """Minimize log loss with weighted average."""5 from sklearn.metrics import log_loss6 w = weights / weights.sum() # Normalize7 pred = w[0]*pred_rf_val + w[1]*pred_lgbm_val + w[2]*pred_svc_val8 return log_loss(y_val, pred)910# Optimize11result = minimize(12 objective,13 x0=[1/3, 1/3, 1/3], # Initial equal weights14 method='Nelder-Mead',15 bounds=[(0, 1)] * 316)1718optimal_weights = result.x / result.x.sum()19print(f"Optimal weights: RF={optimal_weights[0]:.3f}, "20 f"LGBM={optimal_weights[1]:.3f}, SVC={optimal_weights[2]:.3f}")4.3 Performance-Based Weights
Python
1from sklearn.metrics import accuracy_score23# Weight by validation accuracy4scores = {5 'rf': accuracy_score(y_val, rf.predict(X_val)),6 'lgbm': accuracy_score(y_val, lgbm.predict(X_val)),7 'svc': accuracy_score(y_val, svc.predict(X_val))8}910total = sum(scores.values())11weights = {k: v/total for k, v in scores.items()}12print(f"Weights: {weights}")5. Bayesian Model Averaging
5.1 Concept
Thay vì chọn 1 model tốt nhất, BMA tính weighted average theo model probability:
Ví dụ
1P(prediction|data) = sum over models: P(prediction|model) * P(model|data)5.2 Simple BMA Implementation
Python
1import numpy as np23def bayesian_model_average(models, X_val, y_val, X_test):4 """BMA using BIC approximation for model weights."""5 from sklearn.metrics import log_loss6 7 bic_scores = []8 predictions = []9 10 for model in models:11 pred = model.predict_proba(X_val)12 n = len(y_val)13 k = getattr(model, 'n_features_in_', 10) # Approximate14 ll = -log_loss(y_val, pred) * n # Log-likelihood approx15 bic = -2 * ll + k * np.log(n)16 bic_scores.append(bic)17 predictions.append(model.predict_proba(X_test))18 19 # Convert BIC to weights20 bic_arr = np.array(bic_scores)21 delta_bic = bic_arr - bic_arr.min()22 weights = np.exp(-0.5 * delta_bic)23 weights /= weights.sum()24 25 # Weighted average26 final_pred = sum(w * p for w, p in zip(weights, predictions))27 return final_pred, weights2829# Usage30bma_pred, bma_weights = bayesian_model_average(31 [rf, lgbm, svc], X_val, y_val, X_test32)6. Best Practices
6.1 Khi nào dùng Ensemble
| Scenario | Recommendation |
|---|---|
| Kaggle competition | Stacking + weighted ensemble |
| Production (speed critical) | Single model (XGBoost/LightGBM) |
| Production (accuracy critical) | Simple ensemble (2-3 models) |
| Small data | Bagging (Random Forest) |
| Large data | Boosting (LightGBM) |
6.2 Diversity Matters
Python
1# Check correlation giữa model predictions2import pandas as pd34pred_df = pd.DataFrame({5 'RF': pred_rf, 'LGBM': pred_lgbm, 'SVC': pred_svc6})7print(pred_df.corr())8# Low correlation = high diversity = better ensemble📝 Quiz
-
Stacking khác Blending ở điểm nào chính?
- Stacking dùng K-fold CV, Blending dùng single holdout
- Stacking nhanh hơn
- Blending chính xác hơn luôn
- Không khác nhau
-
Base models trong ensemble cần?
- Cùng loại algorithm
- Diverse (khác nhau về predictions)
- Cùng hyperparameters
- Train trên cùng data
-
BMA tính weights dựa trên?
- Random
- Training accuracy
- Model probability (BIC/evidence)
- Model complexity
🎯 Key Takeaways
- Stacking — Meta-model learns từ base model predictions (powerful)
- Blending — Simpler stacking with holdout set
- Weighted Ensemble — Optimize weights by metric
- BMA — Bayesian approach, weights by model evidence
- Diversity — Key to ensemble success, check prediction correlation
🚀 Bài tiếp theo
Transfer Learning & Fine-tuning — Tận dụng pre-trained models cho domain mới!
