Predictive Analytics Basics
1. Introduction
Predictive Analytics cho Analysts
Predictive analytics sử dụng historical data để forecast future outcomes. Là Data Analyst, bạn cần hiểu fundamentals để collaborate với Data Scientists và interpret model results.
1.1 Types of Predictions
Text
1┌─────────────────────────────────────────────────────────┐2│ Predictive Analytics Types │3├─────────────────────────────────────────────────────────┤4│ │5│ ┌─────────────────────────────────────────────────┐ │6│ │ 📈 Regression (Continuous) │ │7│ │ • Sales forecasting │ │8│ │ • Customer lifetime value │ │9│ │ • Demand prediction │ │10│ │ Output: Number (e.g., $1,234.56) │ │11│ └─────────────────────────────────────────────────┘ │12│ │13│ ┌─────────────────────────────────────────────────┐ │14│ │ 🏷️ Classification (Categorical) │ │15│ │ • Churn prediction (yes/no) │ │16│ │ • Fraud detection │ │17│ │ • Customer segmentation │ │18│ │ Output: Category (e.g., "High Risk") │ │19│ └─────────────────────────────────────────────────┘ │20│ │21│ ┌─────────────────────────────────────────────────┐ │22│ │ ⏰ Time Series Forecasting │ │23│ │ • Next month's revenue │ │24│ │ • Inventory needs │ │25│ │ • Trend extrapolation │ │26│ │ Output: Future values over time │ │27│ └─────────────────────────────────────────────────┘ │28│ │29└─────────────────────────────────────────────────────────┘2. Simple Linear Regression
2.1 Concept và Implementation
Python
1import pandas as pd2import numpy as np3import matplotlib.pyplot as plt4from sklearn.linear_model import LinearRegression5from sklearn.model_selection import train_test_split6from sklearn.metrics import mean_squared_error, r2_score78# Sample data: Marketing spend vs Sales9np.random.seed(42)10n = 1001112marketing_spend = np.random.uniform(1000, 10000, n)13noise = np.random.normal(0, 500, n)14sales = 500 + 3.5 * marketing_spend + noise # True relationship1516df = pd.DataFrame({17 'marketing_spend': marketing_spend,18 'sales': sales19})2021# Visualize22plt.figure(figsize=(10, 5))23plt.scatter(df['marketing_spend'], df['sales'], alpha=0.5)24plt.xlabel('Marketing Spend ($)')25plt.ylabel('Sales ($)')26plt.title('Marketing Spend vs Sales')27plt.show()2829print(df.describe())2.2 Build và Evaluate Model
Python
1# Prepare data2X = df[['marketing_spend']] # Features (must be 2D)3y = df['sales'] # Target45# Split data6X_train, X_test, y_train, y_test = train_test_split(7 X, y, test_size=0.2, random_state=428)910# Train model11model = LinearRegression()12model.fit(X_train, y_train)1314# Coefficients15print(f"Intercept: ${model.intercept_:.2f}")16print(f"Coefficient: ${model.coef_[0]:.2f} per $1 marketing spend")1718# Predictions19y_pred = model.predict(X_test)2021# Evaluation22rmse = np.sqrt(mean_squared_error(y_test, y_pred))23r2 = r2_score(y_test, y_pred)2425print(f"\nModel Performance:")26print(f"RMSE: ${rmse:.2f}")27print(f"R² Score: {r2:.4f}")28print(f" → Model explains {r2*100:.1f}% of variance")2.3 Visualize Results
Python
1# Plot actual vs predicted2fig, axes = plt.subplots(1, 2, figsize=(14, 5))34# Regression line5axes[0].scatter(X, y, alpha=0.5, label='Actual')6X_line = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)7y_line = model.predict(X_line)8axes[0].plot(X_line, y_line, 'r-', linewidth=2, label='Prediction')9axes[0].set_xlabel('Marketing Spend ($)')10axes[0].set_ylabel('Sales ($)')11axes[0].set_title('Linear Regression Fit')12axes[0].legend()1314# Actual vs Predicted15axes[1].scatter(y_test, y_pred, alpha=0.5)16axes[1].plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')17axes[1].set_xlabel('Actual Sales ($)')18axes[1].set_ylabel('Predicted Sales ($)')19axes[1].set_title('Actual vs Predicted')2021plt.tight_layout()22plt.show()2.4 Make Predictions
Python
1# Predict for new values2new_spend = pd.DataFrame({'marketing_spend': [5000, 7500, 10000]})3predictions = model.predict(new_spend)45print("Sales Predictions:")6for spend, pred in zip(new_spend['marketing_spend'], predictions):7 print(f" ${spend:,.0f} spend → ${pred:,.2f} sales (expected)")3. Multiple Regression
3.1 Multiple Features
Python
1# Generate multi-feature data2np.random.seed(42)3n = 20045df = pd.DataFrame({6 'marketing_spend': np.random.uniform(1000, 10000, n),7 'store_traffic': np.random.uniform(500, 5000, n),8 'avg_price': np.random.uniform(20, 100, n),9 'num_promotions': np.random.randint(0, 10, n),10 'is_weekend': np.random.choice([0, 1], n)11})1213# Generate sales with all features14df['sales'] = (15 500 + 16 2.5 * df['marketing_spend'] +17 1.2 * df['store_traffic'] -18 3.0 * df['avg_price'] +19 200 * df['num_promotions'] +20 1500 * df['is_weekend'] +21 np.random.normal(0, 500, n)22)2324print("Features correlation with sales:")25print(df.corr()['sales'].sort_values(ascending=False))3.2 Train Multi-Feature Model
Python
1# Prepare data2features = ['marketing_spend', 'store_traffic', 'avg_price', 'num_promotions', 'is_weekend']3X = df[features]4y = df['sales']56# Split7X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)89# Train10model = LinearRegression()11model.fit(X_train, y_train)1213# Coefficients14print("Feature Coefficients:")15for feature, coef in zip(features, model.coef_):16 direction = "↑" if coef > 0 else "↓"17 print(f" {feature}: {coef:.2f} {direction}")1819# Evaluate20y_pred = model.predict(X_test)21print(f"\nR² Score: {r2_score(y_test, y_pred):.4f}")22print(f"RMSE: ${np.sqrt(mean_squared_error(y_test, y_pred)):.2f}")3.3 Feature Importance
Python
1# Standardized coefficients for comparison2from sklearn.preprocessing import StandardScaler34scaler = StandardScaler()5X_scaled = scaler.fit_transform(X_train)67model_scaled = LinearRegression()8model_scaled.fit(X_scaled, y_train)910# Plot feature importance11importance = pd.DataFrame({12 'feature': features,13 'importance': np.abs(model_scaled.coef_)14}).sort_values('importance', ascending=True)1516plt.figure(figsize=(10, 5))17plt.barh(importance['feature'], importance['importance'])18plt.xlabel('Absolute Standardized Coefficient')19plt.title('Feature Importance')20plt.show()4. Classification Basics
4.1 Binary Classification (Churn Prediction)
Python
1from sklearn.linear_model import LogisticRegression2from sklearn.metrics import accuracy_score, confusion_matrix, classification_report34# Generate churn data5np.random.seed(42)6n = 100078customers = pd.DataFrame({9 'tenure_months': np.random.randint(1, 72, n),10 'monthly_spend': np.random.uniform(20, 200, n),11 'support_tickets': np.random.poisson(2, n),12 'last_login_days': np.random.exponential(30, n),13 'contract_type': np.random.choice(['monthly', 'annual'], n)14})1516# Calculate churn probability17churn_prob = (18 0.1 +19 -0.01 * customers['tenure_months'] +20 -0.002 * customers['monthly_spend'] +21 0.1 * customers['support_tickets'] +22 0.01 * customers['last_login_days'] +23 0.2 * (customers['contract_type'] == 'monthly')24)25churn_prob = 1 / (1 + np.exp(-churn_prob)) # Sigmoid26customers['churned'] = (np.random.random(n) < churn_prob).astype(int)2728print(f"Churn Rate: {customers['churned'].mean()*100:.1f}%")29print(customers.head())4.2 Train Classification Model
Python
1# Prepare features2customers['is_monthly'] = (customers['contract_type'] == 'monthly').astype(int)3features = ['tenure_months', 'monthly_spend', 'support_tickets', 'last_login_days', 'is_monthly']45X = customers[features]6y = customers['churned']78# Split9X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)1011# Train logistic regression12clf = LogisticRegression(random_state=42)13clf.fit(X_train, y_train)1415# Predictions16y_pred = clf.predict(X_test)17y_prob = clf.predict_proba(X_test)[:, 1] # Probability of churn1819# Evaluation20print("Classification Report:")21print(classification_report(y_test, y_pred))2223print(f"\nAccuracy: {accuracy_score(y_test, y_pred)*100:.1f}%")4.3 Confusion Matrix
Python
1from sklearn.metrics import ConfusionMatrixDisplay23fig, axes = plt.subplots(1, 2, figsize=(12, 5))45# Confusion matrix6cm = confusion_matrix(y_test, y_pred)7ConfusionMatrixDisplay(cm, display_labels=['Stay', 'Churn']).plot(ax=axes[0])8axes[0].set_title('Confusion Matrix')910# Probability distribution11axes[1].hist(y_prob[y_test == 0], bins=20, alpha=0.5, label='Stayed')12axes[1].hist(y_prob[y_test == 1], bins=20, alpha=0.5, label='Churned')13axes[1].set_xlabel('Predicted Churn Probability')14axes[1].set_ylabel('Count')15axes[1].set_title('Probability Distribution')16axes[1].legend()1718plt.tight_layout()19plt.show()2021# Interpretation22tn, fp, fn, tp = cm.ravel()23print(f"\nMatrix Interpretation:")24print(f" True Negatives (correctly predicted Stay): {tn}")25print(f" False Positives (wrongly predicted Churn): {fp}")26print(f" False Negatives (missed Churns): {fn}")27print(f" True Positives (correctly predicted Churn): {tp}")4.4 Risk Scoring
Python
1# Create risk scores2test_customers = X_test.copy()3test_customers['actual_churn'] = y_test.values4test_customers['churn_probability'] = y_prob5test_customers['risk_score'] = pd.cut(6 y_prob, 7 bins=[0, 0.3, 0.6, 1.0],8 labels=['Low', 'Medium', 'High']9)1011print("Risk Distribution:")12print(test_customers['risk_score'].value_counts())1314# Churn rate by risk segment15risk_analysis = test_customers.groupby('risk_score').agg({16 'actual_churn': ['count', 'sum', 'mean']17}).round(3)18risk_analysis.columns = ['total', 'churned', 'churn_rate']19print("\nChurn Rate by Risk Segment:")20print(risk_analysis)5. Decision Trees
5.1 Simple Decision Tree
Python
1from sklearn.tree import DecisionTreeClassifier, plot_tree23# Train decision tree4tree = DecisionTreeClassifier(max_depth=3, random_state=42)5tree.fit(X_train, y_train)67# Visualize8plt.figure(figsize=(20, 10))9plot_tree(10 tree,11 feature_names=features,12 class_names=['Stay', 'Churn'],13 filled=True,14 rounded=True,15 fontsize=1016)17plt.title('Decision Tree for Churn Prediction')18plt.tight_layout()19plt.savefig('decision_tree.png', dpi=150)20plt.show()2122# Evaluate23y_pred_tree = tree.predict(X_test)24print(f"Decision Tree Accuracy: {accuracy_score(y_test, y_pred_tree)*100:.1f}%")5.2 Feature Importance from Trees
Python
1# Feature importance2importance = pd.DataFrame({3 'feature': features,4 'importance': tree.feature_importances_5}).sort_values('importance', ascending=False)67print("Feature Importance (Decision Tree):")8print(importance)910# Visualize11plt.figure(figsize=(10, 5))12plt.barh(importance['feature'], importance['importance'])13plt.xlabel('Importance')14plt.title('Feature Importance from Decision Tree')15plt.gca().invert_yaxis()16plt.show()6. Simple Forecasting
6.1 Moving Average Forecast
Python
1# Generate time series2dates = pd.date_range('2022-01-01', periods=365*2, freq='D')3np.random.seed(42)45trend = np.linspace(1000, 1500, len(dates))6seasonality = 200 * np.sin(2 * np.pi * np.arange(len(dates)) / 365)7noise = np.random.normal(0, 50, len(dates))89sales = trend + seasonality + noise1011ts = pd.DataFrame({12 'date': dates,13 'sales': sales14}).set_index('date')1516# Simple moving average forecast17ts['MA_7'] = ts['sales'].rolling(7).mean()18ts['MA_30'] = ts['sales'].rolling(30).mean()1920# Forecast using last moving average21last_ma30 = ts['MA_30'].iloc[-1]22print(f"30-day MA Forecast: ${last_ma30:.2f}")2324# Plot25plt.figure(figsize=(14, 5))26plt.plot(ts.index[-90:], ts['sales'][-90:], alpha=0.5, label='Actual')27plt.plot(ts.index[-90:], ts['MA_30'][-90:], label='30-day MA')28plt.axhline(last_ma30, color='r', linestyle='--', label=f'Forecast: ${last_ma30:.0f}')29plt.legend()30plt.title('Sales Forecast using Moving Average')31plt.show()6.2 Trend Extrapolation
Python
1from scipy import stats23# Fit linear trend4x = np.arange(len(ts))5y = ts['sales'].values67slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)89# Extrapolate 30 days10future_x = np.arange(len(ts), len(ts) + 30)11forecast = intercept + slope * future_x1213print(f"Trend: ${slope:.2f} per day")14print(f"30-day Forecast Range: ${forecast[0]:.2f} to ${forecast[-1]:.2f}")1516# Plot17plt.figure(figsize=(14, 5))18plt.plot(ts.index, ts['sales'], alpha=0.5, label='Historical')19future_dates = pd.date_range(ts.index[-1] + pd.Timedelta(days=1), periods=30)20plt.plot(future_dates, forecast, 'r--', linewidth=2, label='Forecast')21plt.legend()22plt.title('Sales Forecast using Trend Extrapolation')23plt.show()6.3 Seasonal Adjustment
Python
1def seasonal_forecast(ts, periods=30):2 """Simple seasonal forecast"""3 # Calculate monthly averages4 ts['month'] = ts.index.month5 monthly_avg = ts.groupby('month')['sales'].mean()6 7 # Overall trend8 recent = ts.iloc[-365:] # Last year9 x = np.arange(len(recent))10 y = recent['sales'].values11 slope, intercept, _, _, _ = stats.linregress(x, y)12 13 # Generate forecast14 forecast_dates = pd.date_range(ts.index[-1] + pd.Timedelta(days=1), periods=periods)15 forecasts = []16 17 for i, date in enumerate(forecast_dates):18 # Base from trend19 base = intercept + slope * (len(recent) + i)20 21 # Seasonal adjustment22 seasonal_factor = monthly_avg[date.month] / ts['sales'].mean()23 24 forecasts.append({25 'date': date,26 'forecast': base * seasonal_factor27 })28 29 return pd.DataFrame(forecasts)3031forecast_df = seasonal_forecast(ts, 30)32print("Seasonal Forecast:")33print(forecast_df.head(10))7. Model Evaluation Best Practices
7.1 Cross-Validation
Python
1from sklearn.model_selection import cross_val_score23# Regression cross-validation4X = df[features]5y = df['sales']67model = LinearRegression()8scores = cross_val_score(model, X, y, cv=5, scoring='r2')910print("Cross-Validation Results (R²):")11print(f" Scores: {scores.round(4)}")12print(f" Mean: {scores.mean():.4f}")13print(f" Std: {scores.std():.4f}")1415# Classification cross-validation16clf = LogisticRegression(random_state=42)17X_clf = customers[features]18y_clf = customers['churned']1920scores = cross_val_score(clf, X_clf, y_clf, cv=5, scoring='accuracy')21print(f"\nClassification Accuracy: {scores.mean():.4f} (+/- {scores.std():.4f})")7.2 Understanding Metrics
Python
1def explain_metrics(y_true, y_pred, y_prob=None):2 """Explain classification metrics"""3 from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score4 5 print("=" * 50)6 print("CLASSIFICATION METRICS EXPLAINED")7 print("=" * 50)8 9 accuracy = accuracy_score(y_true, y_pred)10 print(f"\n📊 Accuracy: {accuracy:.2%}")11 print(f" What it means: {accuracy:.0%} of all predictions are correct")12 13 precision = precision_score(y_true, y_pred)14 print(f"\n🎯 Precision: {precision:.2%}")15 print(f" What it means: When we predict Churn, we're right {precision:.0%} of the time")16 17 recall = recall_score(y_true, y_pred)18 print(f"\n🔍 Recall: {recall:.2%}")19 print(f" What it means: We catch {recall:.0%} of actual Churns")20 21 f1 = f1_score(y_true, y_pred)22 print(f"\n⚖️ F1 Score: {f1:.2%}")23 print(f" What it means: Balanced measure of precision and recall")24 25 if y_prob is not None:26 auc = roc_auc_score(y_true, y_prob)27 print(f"\n📈 AUC-ROC: {auc:.2%}")28 print(f" What it means: Model's ability to distinguish classes (0.5=random, 1.0=perfect)")29 30 print("\n" + "=" * 50)3132explain_metrics(y_test, y_pred, y_prob)8. Thực hành
Predictive Analytics Project
Exercise: Build Predictive Model
Python
1# Build a churn prediction model:2# 1. Prepare and explore data3# 2. Train multiple models4# 3. Compare performance5# 4. Create actionable insights67# YOUR CODE HERE💡 Xem đáp án
Python
1import pandas as pd2import numpy as np3import matplotlib.pyplot as plt4from sklearn.model_selection import train_test_split, cross_val_score5from sklearn.linear_model import LogisticRegression6from sklearn.tree import DecisionTreeClassifier7from sklearn.ensemble import RandomForestClassifier8from sklearn.metrics import accuracy_score, classification_report, roc_auc_score, roc_curve910# 1. Generate realistic customer data11np.random.seed(42)12n = 20001314customers = pd.DataFrame({15 'customer_id': range(1, n+1),16 'tenure_months': np.random.randint(1, 60, n),17 'monthly_charges': np.random.uniform(20, 150, n),18 'total_charges': np.random.uniform(100, 5000, n),19 'contract_type': np.random.choice(['Month-to-month', 'One year', 'Two year'], n, p=[0.5, 0.3, 0.2]),20 'payment_method': np.random.choice(['Credit card', 'Bank transfer', 'Electronic check'], n),21 'tech_support': np.random.choice([0, 1], n, p=[0.6, 0.4]),22 'online_security': np.random.choice([0, 1], n, p=[0.55, 0.45]),23 'num_services': np.random.randint(1, 6, n)24})2526# Create churn27churn_score = (28 -0.03 * customers['tenure_months'] +29 0.01 * customers['monthly_charges'] -30 0.001 * customers['total_charges'] +31 0.8 * (customers['contract_type'] == 'Month-to-month') -32 0.3 * customers['tech_support'] -33 0.2 * customers['online_security'] -34 0.1 * customers['num_services']35)36customers['churned'] = (np.random.random(n) < 1/(1+np.exp(-churn_score))).astype(int)3738print("="*50)39print("DATA OVERVIEW")40print("="*50)41print(f"Total customers: {len(customers)}")42print(f"Churn rate: {customers['churned'].mean()*100:.1f}%")4344# 2. Feature engineering45customers['is_monthly'] = (customers['contract_type'] == 'Month-to-month').astype(int)46customers['avg_monthly'] = customers['total_charges'] / np.maximum(customers['tenure_months'], 1)4748features = ['tenure_months', 'monthly_charges', 'total_charges', 49 'is_monthly', 'tech_support', 'online_security', 'num_services', 'avg_monthly']5051X = customers[features]52y = customers['churned']5354# Split55X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)5657# 3. Train multiple models58models = {59 'Logistic Regression': LogisticRegression(max_iter=1000, random_state=42),60 'Decision Tree': DecisionTreeClassifier(max_depth=5, random_state=42),61 'Random Forest': RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)62}6364print("\n" + "="*50)65print("MODEL COMPARISON")66print("="*50)6768results = []69for name, model in models.items():70 # Train71 model.fit(X_train, y_train)72 73 # Predict74 y_pred = model.predict(X_test)75 y_prob = model.predict_proba(X_test)[:, 1]76 77 # Metrics78 acc = accuracy_score(y_test, y_pred)79 auc = roc_auc_score(y_test, y_prob)80 cv_scores = cross_val_score(model, X, y, cv=5, scoring='roc_auc')81 82 results.append({83 'Model': name,84 'Accuracy': acc,85 'AUC-ROC': auc,86 'CV AUC (mean)': cv_scores.mean(),87 'CV AUC (std)': cv_scores.std()88 })89 90 print(f"\n{name}:")91 print(f" Accuracy: {acc:.2%}")92 print(f" AUC-ROC: {auc:.4f}")93 print(f" CV AUC: {cv_scores.mean():.4f} (+/- {cv_scores.std():.4f})")9495results_df = pd.DataFrame(results)9697# 4. Best model analysis98best_model = models['Random Forest']99y_prob = best_model.predict_proba(X_test)[:, 1]100101# Feature importance102importance = pd.DataFrame({103 'feature': features,104 'importance': best_model.feature_importances_105}).sort_values('importance', ascending=False)106107print("\n" + "="*50)108print("FEATURE IMPORTANCE (Random Forest)")109print("="*50)110for _, row in importance.iterrows():111 bar = "█" * int(row['importance'] * 50)112 print(f"{row['feature']:20s} {bar} {row['importance']:.3f}")113114# 5. Actionable insights115print("\n" + "="*50)116print("ACTIONABLE INSIGHTS")117print("="*50)118119# Risk segments120test_df = X_test.copy()121test_df['churn_prob'] = y_prob122test_df['actual'] = y_test.values123test_df['risk'] = pd.cut(y_prob, bins=[0, 0.3, 0.6, 1.0], labels=['Low', 'Medium', 'High'])124125risk_summary = test_df.groupby('risk').agg({126 'actual': ['count', 'sum', 'mean'],127 'churn_prob': 'mean'128}).round(3)129risk_summary.columns = ['Count', 'Churned', 'Actual_Rate', 'Avg_Prob']130131print("\nRisk Segments:")132print(risk_summary)133134print("\n💡 RECOMMENDATIONS:")135print("1. Focus retention on HIGH risk segment (highest ROI)")136print("2. Promote annual contracts to monthly customers")137print("3. Encourage tech support adoption")138print("4. Bundle more services to increase stickiness")139140# 6. Visualizations141fig, axes = plt.subplots(2, 2, figsize=(14, 10))142143# ROC Curves144for name, model in models.items():145 y_prob = model.predict_proba(X_test)[:, 1]146 fpr, tpr, _ = roc_curve(y_test, y_prob)147 auc = roc_auc_score(y_test, y_prob)148 axes[0, 0].plot(fpr, tpr, label=f'{name} (AUC={auc:.3f})')149axes[0, 0].plot([0, 1], [0, 1], 'k--')150axes[0, 0].set_xlabel('False Positive Rate')151axes[0, 0].set_ylabel('True Positive Rate')152axes[0, 0].set_title('ROC Curves')153axes[0, 0].legend()154155# Feature importance156axes[0, 1].barh(importance['feature'], importance['importance'])157axes[0, 1].set_xlabel('Importance')158axes[0, 1].set_title('Feature Importance')159axes[0, 1].invert_yaxis()160161# Risk distribution162risk_counts = test_df['risk'].value_counts()163axes[1, 0].bar(risk_counts.index, risk_counts.values, color=['green', 'orange', 'red'])164axes[1, 0].set_xlabel('Risk Level')165axes[1, 0].set_ylabel('Customers')166axes[1, 0].set_title('Risk Distribution')167168# Churn by tenure169tenure_churn = customers.groupby(pd.cut(customers['tenure_months'], bins=5))['churned'].mean()170axes[1, 1].bar(range(len(tenure_churn)), tenure_churn.values)171axes[1, 1].set_xticklabels([str(x) for x in tenure_churn.index], rotation=45, ha='right')172axes[1, 1].set_xlabel('Tenure (months)')173axes[1, 1].set_ylabel('Churn Rate')174axes[1, 1].set_title('Churn Rate by Tenure')175176plt.tight_layout()177plt.savefig('churn_analysis.png', dpi=150)178plt.show()179180print("\n✅ Analysis complete!")9. Tổng kết
| Topic | Key Concepts |
|---|---|
| Regression | Predict continuous values, coefficients, R² |
| Classification | Predict categories, probability, precision/recall |
| Decision Trees | Visual rules, feature importance |
| Forecasting | Moving average, trend extrapolation |
| Evaluation | Cross-validation, confusion matrix, AUC |
Bài tiếp theo: Data Storytelling
