📊 T-tests và Chi-square Test

Mục tiêu bài học

Sau bài học này, bạn sẽ:

Thực hiện One-sample và Two-sample T-tests
Hiểu Paired vs Independent samples
Áp dụng Chi-square test cho categorical data
Chọn đúng test cho từng tình huống

1. T-test Overview

1.1 Khi nào dùng T-test?

Điều kiện	Z-test	T-test
σ known	✓
σ unknown		✓
n < 30		✓
Population normal	✓	✓

1.2 Các loại T-test

Các loại T-test

📊T-test

2. One-sample T-test

2.1 Mục đích

So sánh sample mean với một giá trị cụ thể (population mean giả định).

2.2 Test Statistic

$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$

Degrees of freedom: df = n - 1

2.3 Ví dụ

Bài toán: Claim caffeine content = 200mg. Mẫu 10 lon: mean = 195mg, std = 8mg. Test ở α = 0.05.

Python

1from scipy import stats
2import numpy as np
3
4# Dữ liệu mẫu
5sample = [192, 198, 190, 195, 202, 188, 197, 194, 199, 195]
6
7# H₀: μ = 200, H₁: μ ≠ 200
8mu_0 = 200
9alpha = 0.05
10
11# One-sample t-test
12t_stat, p_value = stats.ttest_1samp(sample, mu_0)
13
14print(f"Sample mean: {np.mean(sample):.2f}")
15print(f"Sample std: {np.std(sample, ddof=1):.2f}")
16print(f"t-statistic: {t_stat:.4f}")
17print(f"P-value: {p_value:.4f}")
18print(f"df: {len(sample) - 1}")
19
20if p_value < alpha:
21    print(f"\n→ Reject H₀: Mean khác 200mg")
22else:
23    print(f"\n→ Fail to Reject H₀: Không đủ bằng chứng")

2.4 Manual calculation

Python

1import numpy as np
2from scipy import stats
3
4sample = [192, 198, 190, 195, 202, 188, 197, 194, 199, 195]
5mu_0 = 200
6
7n = len(sample)
8x_bar = np.mean(sample)
9s = np.std(sample, ddof=1)
10df = n - 1
11
12# T-statistic
13t = (x_bar - mu_0) / (s / np.sqrt(n))
14print(f"t = {t:.4f}")
15
16# P-value (two-tailed)
17p_value = 2 * (1 - stats.t.cdf(abs(t), df))
18print(f"P-value = {p_value:.4f}")
19
20# Critical value
21t_critical = stats.t.ppf(0.975, df)
22print(f"Critical value (α=0.05): ±{t_critical:.4f}")

3. Independent Two-sample T-test

3.1 Mục đích

So sánh means của 2 nhóm độc lập.

3.2 Assumptions

✅ Hai samples độc lập
✅ Data approximately normal
✅ Equal variances (có thể điều chỉnh)

3.3 Test Statistic

Equal variances (pooled):

$t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}$

$s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}$

3.4 Ví dụ

Bài toán: So sánh điểm thi giữa 2 lớp.

Python

1from scipy import stats
2import numpy as np
3
4# Dữ liệu
5class_A = [85, 90, 78, 92, 88, 76, 95, 89, 82, 91]
6class_B = [72, 85, 80, 78, 88, 70, 82, 75, 79, 84]
7
8alpha = 0.05
9
10# H₀: μ_A = μ_B, H₁: μ_A ≠ μ_B
11
12# Descriptive stats
13print("Class A:", f"mean={np.mean(class_A):.2f}, std={np.std(class_A, ddof=1):.2f}")
14print("Class B:", f"mean={np.mean(class_B):.2f}, std={np.std(class_B, ddof=1):.2f}")
15
16# Independent t-test (equal variances assumed)
17t_stat, p_value = stats.ttest_ind(class_A, class_B)
18print(f"\nt-statistic: {t_stat:.4f}")
19print(f"P-value: {p_value:.4f}")
20
21# Welch's t-test (unequal variances)
22t_stat_welch, p_value_welch = stats.ttest_ind(class_A, class_B, equal_var=False)
23print(f"\nWelch's t-test:")
24print(f"t-statistic: {t_stat_welch:.4f}")
25print(f"P-value: {p_value_welch:.4f}")
26
27if p_value < alpha:
28    print(f"\n→ Reject H₀: Hai lớp có điểm khác nhau")
29else:
30    print(f"\n→ Fail to Reject H₀: Không đủ bằng chứng")

3.5 Kiểm tra Equal Variances (Levene's Test)

Python

1from scipy import stats
2
3# Levene's test
4# H₀: σ₁² = σ₂²
5stat, p_value = stats.levene(class_A, class_B)
6print(f"Levene's test: stat={stat:.4f}, p-value={p_value:.4f}")
7
8if p_value > 0.05:
9    print("→ Variances are equal, use standard t-test")
10else:
11    print("→ Variances are unequal, use Welch's t-test")

4. Paired T-test

4.1 Mục đích

So sánh means của cùng một nhóm ở 2 thời điểm hoặc 2 điều kiện.

4.2 Test Statistic

$t = \frac{\bar{d}}{s_d / \sqrt{n}}$

Với d = difference cho mỗi pair

4.3 Ví dụ

Bài toán: So sánh cân nặng trước và sau chương trình giảm cân.

Python

1from scipy import stats
2import numpy as np
3
4# Dữ liệu (same subjects)
5before = [85, 90, 78, 92, 88, 76, 95, 89, 82, 91]
6after = [82, 86, 75, 88, 84, 74, 90, 85, 79, 87]
7
8alpha = 0.05
9
10# H₀: μ_diff = 0 (không thay đổi)
11# H₁: μ_diff ≠ 0 (có thay đổi)
12
13# Calculate differences
14differences = np.array(before) - np.array(after)
15print(f"Differences: {differences}")
16print(f"Mean difference: {np.mean(differences):.2f}")
17
18# Paired t-test
19t_stat, p_value = stats.ttest_rel(before, after)
20print(f"\nt-statistic: {t_stat:.4f}")
21print(f"P-value: {p_value:.4f}")
22
23if p_value < alpha:
24    print(f"\n→ Reject H₀: Chương trình có hiệu quả")
25else:
26    print(f"\n→ Fail to Reject H₀: Không đủ bằng chứng")

5. Chọn đúng T-test

Chọn đúng T-test

❓So sánh means?

🔢Mấy nhóm?

1️⃣1 nhóm vs giá trị → One-sample

❓Cùng subjects?

✅Yes → Paired t-test

❌No → Independent t-test

Tình huống	Test
Mean vs target value	One-sample
Before vs After (same people)	Paired
Group A vs Group B (different people)	Independent

6. Chi-square Test

6.1 Mục đích

Kiểm định mối quan hệ giữa categorical variables.

6.2 Các loại

Test	Mục đích
Goodness of Fit	Data có fit distribution mong đợi?
Independence	2 categorical variables có độc lập?

7. Chi-square Goodness of Fit

7.1 Test Statistic

$\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$

Với:

O = Observed frequency
E = Expected frequency

7.2 Ví dụ: Xúc xắc có cân không?

Python

1from scipy import stats
2import numpy as np
3
4# Tung xúc xắc 600 lần
5observed = [95, 105, 98, 110, 92, 100]  # Số lần mỗi mặt
6expected = [100] * 6  # Expected nếu fair
7
8# Chi-square test
9chi2, p_value = stats.chisquare(observed, expected)
10
11print(f"Observed: {observed}")
12print(f"Expected: {expected}")
13print(f"\nChi-square: {chi2:.4f}")
14print(f"P-value: {p_value:.4f}")
15print(f"df: {len(observed) - 1}")
16
17if p_value < 0.05:
18    print("\n→ Reject H₀: Xúc xắc không cân")
19else:
20    print("\n→ Fail to Reject H₀: Xúc xắc có vẻ cân")

8. Chi-square Test of Independence

8.1 Contingency Table

	Prefer A	Prefer B	Total
Male	30	45	75
Female	40	35	75
Total	70	80	150

8.2 Expected Frequency

$E_{ij} = \frac{Row_i \times Col_j}{Total}$

8.3 Ví dụ

Bài toán: Giới tính có liên quan đến preference không?

Python

1from scipy import stats
2import numpy as np
3import pandas as pd
4
5# Contingency table
6observed = np.array([[30, 45],   # Male: A, B
7                     [40, 35]])  # Female: A, B
8
9# Chi-square test of independence
10chi2, p_value, dof, expected = stats.chi2_contingency(observed)
11
12print("=== Observed ===")
13print(pd.DataFrame(observed, 
14                   index=['Male', 'Female'], 
15                   columns=['Prefer A', 'Prefer B']))
16
17print("\n=== Expected ===")
18print(pd.DataFrame(expected.round(2), 
19                   index=['Male', 'Female'], 
20                   columns=['Prefer A', 'Prefer B']))
21
22print(f"\nChi-square: {chi2:.4f}")
23print(f"P-value: {p_value:.4f}")
24print(f"Degrees of freedom: {dof}")
25
26if p_value < 0.05:
27    print("\n→ Reject H₀: Giới tính và preference có liên quan")
28else:
29    print("\n→ Fail to Reject H₀: Không có mối liên hệ")

8.4 Cramér's V (Effect Size)

Python

1def cramers_v(contingency_table):
2    chi2, p, dof, expected = stats.chi2_contingency(contingency_table)
3    n = contingency_table.sum()
4    min_dim = min(contingency_table.shape) - 1
5    return np.sqrt(chi2 / (n * min_dim))
6
7v = cramers_v(observed)
8print(f"Cramér's V: {v:.4f}")
9
10# Interpretation
11if v < 0.1:
12    print("→ Negligible association")
13elif v < 0.3:
14    print("→ Small association")
15elif v < 0.5:
16    print("→ Medium association")
17else:
18    print("→ Large association")

9. Assumptions và Điều kiện

9.1 T-test Assumptions

Assumption	Cách kiểm tra
Normality	Shapiro-Wilk test, Q-Q plot
Equal variances	Levene's test
Independence	Study design

9.2 Chi-square Assumptions

Expected frequency ≥ 5 trong mỗi cell
Observations độc lập

Python

1from scipy import stats
2
3# Kiểm tra normality
4data = [85, 90, 78, 92, 88, 76, 95, 89, 82, 91]
5stat, p_value = stats.shapiro(data)
6print(f"Shapiro-Wilk: stat={stat:.4f}, p={p_value:.4f}")
7if p_value > 0.05:
8    print("→ Data is approximately normal")

10. Summary Table

Test	Khi nào dùng	scipy function
One-sample t	Mean vs value	`ttest_1samp`
Independent t	2 group means	`ttest_ind`
Paired t	Before/After	`ttest_rel`
Chi-square GoF	Fit distribution	`chisquare`
Chi-square Ind	Independence	`chi2_contingency`

11. Bài tập thực hành

Bài tập 1: One-sample t-test

Claim: Battery life = 10 hours. Sample (n=15): mean=9.5, std=1.2. Test tại α = 0.05.

Bài tập 2: Independent t-test

Drug A	Drug B
23, 25, 28, 22, 26	30, 32, 29, 31, 28

Hai loại thuốc có hiệu quả khác nhau không?

Bài tập 3: Chi-square

	Thành công	Thất bại
Method A	60	40
Method B	45	55

Có mối liên hệ giữa method và outcome không?

Tóm tắt

Test	H₀	Statistic
One-sample t	μ = μ₀	t = (x̄ - μ₀)/(s/√n)
Independent t	μ₁ = μ₂	t = (x̄₁ - x̄₂)/SE
Paired t	μ_d = 0	t = d̄/(s_d/√n)
Chi-square	Independent	χ² = Σ(O-E)²/E

Key Takeaways

One-sample t: sample mean vs known value
Independent t: 2 different groups
Paired t: same subjects, 2 conditions
Chi-square: categorical variables
Luôn kiểm tra assumptions trước khi test