Quiz tổng hợp

🎯 Mục tiêu bài Quiz

TB5 min

Kiểm tra kiến thức của bạn qua các câu hỏi trắc nghiệm và bài tập thực hành!

✅ 18 câu hỏi trắc nghiệm (6 phần)

✅ 3 bài tập thực hành

✅ Yêu cầu: Hoàn thành bài 1-12

Thời gian: 30 phút | Độ khó: Tổng hợp | Đạt yêu cầu: ≥ 13/18 câu đúng

Task 0

Phần 1: Hyperparameter Tuning

TB5 min

Task 1

Phần 2: Advanced Ensemble Methods

TB5 min

Task 2

Phần 3: Recommendation Systems

TB5 min

Task 3

Phần 4: MLOps & Pipelines

TB5 min

Task 4

Phần 5: Feature Store & Model Monitoring

TB5 min

Task 5

Phần 6: Responsible AI & AutoML

TB5 min

Task 6

📝 Bài tập thực hành

TB5 min

Bài tập 1: Hyperparameter Tuning với Optuna

Tình huống: Bạn cần tối ưu hyperparameters cho XGBoost Classifier trên bài toán phân loại. Hãy viết code dùng Optuna để tìm tham số tối ưu cho max_depth, learning_rate, n_estimators, và subsample.

Đáp án

Python

1import optuna
2from xgboost import XGBClassifier
3from sklearn.model_selection import cross_val_score
4from sklearn.datasets import make_classification
5
6X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
7
8def objective(trial):
9    params = {
10        'max_depth': trial.suggest_int('max_depth', 3, 10),
11        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
12        'n_estimators': trial.suggest_int('n_estimators', 100, 1000, step=100),
13        'subsample': trial.suggest_float('subsample', 0.6, 1.0),
14        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.6, 1.0),
15        'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10.0, log=True),
16        'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10.0, log=True),
17        'use_label_encoder': False,
18        'eval_metric': 'logloss',
19    }
20    model = XGBClassifier(**params, random_state=42)
21    score = cross_val_score(model, X, y, cv=5, scoring='f1').mean()
22    return score
23
24study = optuna.create_study(direction='maximize', study_name='xgb_tuning')
25study.optimize(objective, n_trials=50, show_progress_bar=True)
26
27print(f"Best F1 Score: {study.best_value:.4f}")
28print(f"Best Params: {study.best_params}")
29
30# Visualization
31optuna.visualization.plot_optimization_history(study)
32optuna.visualization.plot_param_importances(study)

Bài tập 2: Stacking Ensemble

Tình huống: Xây dựng Stacking Classifier sử dụng 3 base models (Random Forest, XGBoost, LightGBM) và Logistic Regression làm meta-learner. So sánh hiệu suất với từng model đơn lẻ.

Đáp án

Python

1from sklearn.ensemble import RandomForestClassifier, StackingClassifier
2from sklearn.linear_model import LogisticRegression
3from sklearn.model_selection import cross_val_score
4from sklearn.datasets import make_classification
5from xgboost import XGBClassifier
6from lightgbm import LGBMClassifier
7
8X, y = make_classification(n_samples=2000, n_features=20, random_state=42)
9
10# Base models
11base_models = [
12    ('rf', RandomForestClassifier(n_estimators=200, max_depth=8, random_state=42)),
13    ('xgb', XGBClassifier(n_estimators=200, max_depth=6, learning_rate=0.1,
14                          use_label_encoder=False, eval_metric='logloss', random_state=42)),
15    ('lgbm', LGBMClassifier(n_estimators=200, max_depth=6, learning_rate=0.1,
16                            random_state=42, verbose=-1)),
17]
18
19# Meta-learner
20meta_learner = LogisticRegression(max_iter=1000)
21
22# Stacking
23stacking_clf = StackingClassifier(
24    estimators=base_models,
25    final_estimator=meta_learner,
26    cv=5,
27    stack_method='predict_proba',
28    n_jobs=-1
29)
30
31# So sánh performance
32models = {
33    'Random Forest': base_models[0][1],
34    'XGBoost': base_models[1][1],
35    'LightGBM': base_models[2][1],
36    'Stacking': stacking_clf,
37}
38
39for name, model in models.items():
40    scores = cross_val_score(model, X, y, cv=5, scoring='f1')
41    print(f"{name:20s} | F1: {scores.mean():.4f} ± {scores.std():.4f}")
42# Stacking thường cho F1 cao hơn từng model đơn lẻ

Bài tập 3: Model Monitoring — Phát hiện Data Drift

Tình huống: Bạn có 2 tập dữ liệu: df_train (dữ liệu training gốc) và df_prod (dữ liệu production tuần vừa qua). Hãy viết code phát hiện data drift bằng PSI (Population Stability Index) và KS test.

Đáp án

Python

1import numpy as np
2import pandas as pd
3from scipy import stats
4
5def calculate_psi(expected, actual, bins=10):
6    """Tính Population Stability Index (PSI) giữa 2 phân phối."""
7    breakpoints = np.linspace(
8        min(expected.min(), actual.min()),
9        max(expected.max(), actual.max()),
10        bins + 1
11    )
12    expected_counts = np.histogram(expected, bins=breakpoints)[0] / len(expected)
13    actual_counts = np.histogram(actual, bins=breakpoints)[0] / len(actual)
14
15    # Tránh chia cho 0
16    expected_counts = np.clip(expected_counts, 1e-6, None)
17    actual_counts = np.clip(actual_counts, 1e-6, None)
18
19    psi = np.sum((actual_counts - expected_counts) * np.log(actual_counts / expected_counts))
20    return psi
21
22# Giả lập data
23np.random.seed(42)
24df_train = pd.DataFrame({
25    'age': np.random.normal(35, 10, 5000),
26    'income': np.random.lognormal(10, 1, 5000),
27    'credit_score': np.random.normal(650, 80, 5000),
28})
29df_prod = pd.DataFrame({
30    'age': np.random.normal(32, 12, 1000),       # drift nhẹ
31    'income': np.random.lognormal(10.5, 1.2, 1000),  # drift rõ
32    'credit_score': np.random.normal(645, 85, 1000),  # ổn định
33})
34
35# Kiểm tra drift cho từng feature
36print("=" * 60)
37print(f"{'Feature':15s} | {'PSI':>8s} | {'KS Stat':>8s} | {'p-value':>8s} | Drift?")
38print("=" * 60)
39
40for col in df_train.columns:
41    psi = calculate_psi(df_train[col], df_prod[col])
42    ks_stat, p_value = stats.ks_2samp(df_train[col], df_prod[col])
43    drift = "⚠️ YES" if psi > 0.2 or p_value < 0.05 else "✅ NO"
44    print(f"{col:15s} | {psi:8.4f} | {ks_stat:8.4f} | {p_value:8.4f} | {drift}")
45
46# PSI < 0.1: không có drift
47# 0.1 ≤ PSI < 0.2: drift nhẹ, theo dõi
48# PSI ≥ 0.2: drift đáng kể, cần retrain model

Task 7

📊 Đánh giá kết quả

TB5 min

Số câu đúng	Đánh giá
16-18	🌟 Xuất sắc! Bạn nắm vững Advanced ML & MLOps
13-15	👍 Tốt! Cần ôn lại một số chủ đề
9-12	📚 Cần học thêm, xem lại các bài
< 9	🔄 Nên học lại từ đầu

Task 8

🎓 Hoàn thành khóa học!

TB5 min

🎉 Tuyệt vời! Bạn đã hoàn thành toàn bộ khóa học Advanced Machine Learning & MLOps!

Tiếp theo: Hãy áp dụng các kỹ thuật ML nâng cao và MLOps vào dự án thực tế!

Chúc mừng!

Bạn đã hoàn thành khóa học Advanced Machine Learning & MLOps!

Kỹ năng bạn đã thành thạo:

🎯 Hyperparameter Tuning với Bayesian Optimization, Optuna, HyperOpt
🏗️ Advanced Ensemble: Stacking, Blending, Voting
🎬 Recommendation Systems: Collaborative Filtering, Content-Based, Matrix Factorization, Hybrid
⚙️ MLOps: ML Pipeline, CI/CD, Model Versioning, MLflow, W&B
🗄️ Feature Store: Online/Offline Store, Feast
📡 Model Monitoring: Data Drift, Concept Drift, Model Degradation
🤝 Responsible AI: Fairness, SHAP, LIME, Bias Detection
🤖 AutoML: Tự động hóa quy trình ML end-to-end

Next steps:

Xây dựng ML pipeline hoàn chỉnh cho dự án thực tế
Triển khai model monitoring và alerting trên production
Áp dụng Responsible AI practices vào mọi dự án ML
Tham gia các cuộc thi Kaggle để rèn luyện kỹ năng

Task 9

🎯 Mục tiêu bài Quiz

Phần 1: Hyperparameter Tuning

Phần 2: Advanced Ensemble Methods

Phần 3: Recommendation Systems

Phần 4: MLOps & Pipelines

Phần 5: Feature Store & Model Monitoring

Phần 6: Responsible AI & AutoML

📝 Bài tập thực hành

Bài tập 1: Hyperparameter Tuning với Optuna

Bài tập 2: Stacking Ensemble

Bài tập 3: Model Monitoring — Phát hiện Data Drift

📊 Đánh giá kết quả

🎓 Hoàn thành khóa học!

Khóa học

Mentor & Hỗ trợ

Blog

Giới thiệu