Thực Hành NumPy

🎯 Mục tiêu bài học

TB5 min

Sau bài thực hành này, bạn sẽ:

✅ Thành thạo tạo và thao tác ndarray trong mọi tình huống

✅ Sử dụng Boolean Indexing và Fancy Indexing tự tin

✅ Áp dụng Broadcasting trong tính toán thực tế

✅ Giải quyết bài toán thống kê và xử lý dữ liệu bằng NumPy

Thời gian: 2 giờ | Độ khó: Beginner → Hard | Yêu cầu: Hoàn thành Bài 4 (NumPy)

Task 0

🟢 Phần 1: Tạo và Thao tác Array (Easy)

TB5 min

Ôn lại lý thuyết (Bài 4):

np.array() từ list, np.arange() giống range(), np.linspace() chia đều
np.zeros(), np.ones(), np.eye() tạo mảng đặc biệt
reshape() thay đổi shape, -1 tự tính chiều
Các thuộc tính: shape, ndim, size, dtype

Bài 1.1: Khởi tạo cơ bản

Python

1import numpy as np
2
3# Bài tập: Tạo các mảng sau
4a = np.arange(1, 21)           # [1, 2, ..., 20]
5b = np.linspace(0, 1, 11)      # [0, 0.1, 0.2, ..., 1.0]
6c = np.zeros((3, 5))            # Ma trận 3x5 toàn 0
7d = np.eye(4)                   # Ma trận đơn vị 4x4
8
9# Kiểm tra
10print(f"a: shape={a.shape}, dtype={a.dtype}")
11print(f"b: {b}")
12print(f"c: shape={c.shape}")
13print(f"d:\n{d}")

Bài 1.2: Reshape Challenge

Đề bài: Tạo mảng 1-24, reshape thành các shape khác nhau

Python

1arr = np.arange(1, 25)
2
3# (a) Reshape thành 4x6
4a = arr.reshape(4, 6)
5print(a)
6
7# (b) Reshape thành 2x3x4 (3D)
8b = arr.reshape(2, 3, 4)
9print(b.shape)  # (2, 3, 4)
10
11# (c) Reshape thành 6x4 rồi transpose thành 4x6
12c = arr.reshape(6, 4).T
13print(c.shape)  # (4, 6)
14
15# (d) -1 tự tính
16d = arr.reshape(3, -1)  # → (3, 8)
17print(d.shape)

Bài 1.3: Array Attributes

Đề bài: Tạo mảng random 5x5 và in tất cả thuộc tính

Python

1np.random.seed(42)
2arr = np.random.randint(1, 100, (5, 5))
3
4print(f"Array:\n{arr}")
5print(f"Shape: {arr.shape}")
6print(f"Ndim:  {arr.ndim}")
7print(f"Size:  {arr.size}")
8print(f"Dtype: {arr.dtype}")
9print(f"Bytes: {arr.nbytes}")
10print(f"Min:   {arr.min()} at index {arr.argmin()}")
11print(f"Max:   {arr.max()} at index {arr.argmax()}")

Checkpoint

Bạn đã tạo thành công mảng 3D chưa? Hãy chắc chắn bạn hiểu shape (2, 3, 4) nghĩa là gì!

Task 1

🟡 Phần 2: Indexing và Filtering (Medium)

TB5 min

Ôn lại lý thuyết (Bài 4):

Indexing: a[0], matrix[i, j], a[:, 0] (cột 0)
Boolean Indexing: a[a > 5] → lọc theo điều kiện
Fancy Indexing: a[[1,3,5]] → chọn theo list index
np.where(): np.where(condition, x, y) → chọn yếu tố
NaN handling: np.isnan(), np.nanmean(), np.nanmedian()

Bài 2.1: Trích xuất dữ liệu

Đề bài: Cho bảng điểm, trích xuất theo nhiều cách

Python

1np.random.seed(42)
2# 10 sinh viên × 4 môn (Math, Physics, Chemistry, Biology)
3scores = np.random.randint(40, 100, (10, 4))
4subjects = ["Math", "Physics", "Chemistry", "Biology"]
5
6# (a) Điểm Toán của sinh viên thứ 3 (index 2)
7print(f"SV3 Math: {scores[2, 0]}")
8
9# (b) Tất cả điểm Hóa học
10print(f"Chemistry: {scores[:, 2]}")
11
12# (c) Sinh viên 1-5, 2 môn đầu
13print(f"Top5, Math+Phys:\n{scores[:5, :2]}")
14
15# (d) Sinh viên có điểm Toán > 70
16mask = scores[:, 0] > 70
17print(f"Math > 70:\n{scores[mask]}")
18
19# (e) Số sinh viên đạt (≥ 50) tất cả các môn
20all_pass = np.all(scores >= 50, axis=1)
21print(f"Đạt tất cả: {np.sum(all_pass)}/{len(scores)} sinh viên")

Bài 2.2: Thay thế và Cập nhật

Đề bài: Xử lý dữ liệu bất thường trong mảng

Python

1data = np.array([23, 45, -999, 67, 89, -999, 12, -999, 56, 78])
2
3# (a) Thay -999 bằng NaN
4clean = data.astype(float)
5clean[clean == -999] = np.nan
6print(f"Clean: {clean}")
7
8# (b) Đếm giá trị valid
9valid_count = np.sum(~np.isnan(clean))
10print(f"Valid: {valid_count}/{len(clean)}")
11
12# (c) Tính mean bỏ qua NaN
13print(f"Mean: {np.nanmean(clean):.1f}")
14
15# (d) Thay NaN bằng median
16median_val = np.nanmedian(clean)
17clean[np.isnan(clean)] = median_val
18print(f"Filled: {clean}")

Bài 2.3: np.where — Conditional Selection

Đề bài: Phân loại sinh viên dựa trên điểm

Python

1np.random.seed(42)
2scores = np.random.randint(0, 100, 20)
3
4# Phân loại: A (≥90), B (≥80), C (≥70), D (≥60), F (<60)
5grades = np.where(scores >= 90, "A",
6         np.where(scores >= 80, "B",
7         np.where(scores >= 70, "C",
8         np.where(scores >= 60, "D", "F"))))
9
10for s, g in zip(scores[:10], grades[:10]):
11    print(f"  {s} → {g}")
12
13# Đếm mỗi loại
14unique, counts = np.unique(grades, return_counts=True)
15for grade, count in zip(unique, counts):
16    print(f"  {grade}: {count} sinh viên")

Checkpoint

Boolean indexing là kỹ thuật QUAN TRỌNG NHẤT trong NumPy. Bạn đã filter được dữ liệu bằng mask chưa?

Task 2

🟡 Phần 3: Broadcasting và Toán học (Medium)

TB5 min

Ôn lại lý thuyết (Bài 4):

Broadcasting: Tự động mở rộng shape để tính toán → không cần vòng lặp!
Quy tắc: (3,) + (3,1) → broadcast thành (3,3)
Thêm chiều: arr[:, np.newaxis] hoặc arr[:, None]
Ứng dụng: Normalization, distance matrix, outer product
Hàm toán: np.mean(axis=0), np.sum(), np.dot(), np.sqrt()

Bài 3.1: Min-Max Normalization

Đề bài: Chuẩn hóa dữ liệu về khoảng [0, 1] bằng Broadcasting

Python

1# Dữ liệu: 5 sinh viên × 3 features (Age, Income, Score)
2data = np.array([[22, 30000, 85],
3                 [35, 75000, 92],
4                 [28, 45000, 78],
5                 [45, 95000, 88],
6                 [19, 20000, 95]])
7
8# Min-Max: x_norm = (x - min) / (max - min)
9data_min = data.min(axis=0)   # [19, 20000, 78]
10data_max = data.max(axis=0)   # [45, 95000, 95]
11
12normalized = (data - data_min) / (data_max - data_min)  # Broadcasting!
13print(f"Normalized:\n{normalized.round(3)}")
14# Tất cả giá trị nằm trong [0, 1]

Bài 3.2: Distance Matrix

Đề bài: Tính khoảng cách Euclidean giữa các điểm 2D

Python

1# 4 điểm 2D
2points = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
3
4# Tính pairwise distance bằng broadcasting
5# Reshape để broadcast: (4,1,2) - (1,4,2) = (4,4,2)
6diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
7distances = np.sqrt(np.sum(diff**2, axis=2))
8
9print("Distance Matrix:")
10print(distances.round(3))
11# [[0.    1.    1.    1.414]
12#  [1.    0.    1.414 1.   ]
13#  [1.    1.414 0.    1.   ]
14#  [1.414 1.    1.    0.   ]]

Bài 3.3: Outer Product cho phân tích

Đề bài: Dùng outer product để tạo bảng phí giao hàng (khoảng cách × giá/km)

Python

1distances = np.array([5, 10, 20, 50, 100])          # km
2price_per_km = np.array([1.5, 2.0, 3.0, 5.0])       # $/km
3
4# Outer product
5fee_table = distances[:, np.newaxis] * price_per_km[np.newaxis, :]
6# hoặc: fee_table = np.outer(distances, price_per_km)
7
8print("Bảng phí giao hàng ($):")
9print(f"{'km':<6}", end="")
10for p in price_per_km:
11    print(f"  ${p}/km", end="")
12print()
13for i, d in enumerate(distances):
14    print(f"{d:<6}", end="")
15    for j in range(len(price_per_km)):
16        print(f"  ${fee_table[i,j]:>5.0f}", end="")
17    print()

Checkpoint

Bạn đã hiểu trick [:, np.newaxis] để thêm chiều cho broadcasting chưa? Đây là kỹ thuật nâng cao rất hữu ích!

Task 3

🔴 Phần 4: Bài Toán Thực Tế (Hard)

TB5 min

Đây là phần nâng cao! Các bài này kết hợp nhiều kỹ thuật NumPy:

Monte Carlo simulation: np.random + boolean indexing
Image processing: mảng 3D + slicing + broadcasting
Time series: vectorization + cumsum trick

Nếu khó, hãy đọc solution rồi thử tự code lại!

Bài 4.1: Mô phỏng Monte Carlo — Ước tính π

Đề bài: Dùng random để ước lượng số Pi

Python

1def estimate_pi(n_points=1_000_000):
2    """
3    Ý tưởng: Ném random điểm vào hình vuông [-1,1]x[-1,1].
4    Tỉ lệ điểm rơi trong hình tròn ≈ π/4
5    """
6    rng = np.random.default_rng(42)
7    
8    # Random x, y trong [-1, 1]
9    x = rng.uniform(-1, 1, n_points)
10    y = rng.uniform(-1, 1, n_points)
11    
12    # Kiểm tra trong hình tròn: x² + y² ≤ 1
13    inside = (x**2 + y**2) <= 1
14    
15    pi_estimate = 4 * np.sum(inside) / n_points
16    return pi_estimate
17
18for n in [1000, 10000, 100000, 1_000_000]:
19    pi = estimate_pi(n)
20    error = abs(pi - np.pi)
21    print(f"  n={n:>10,}: π ≈ {pi:.6f} (error: {error:.6f})")

Bài 4.2: Image Processing cơ bản

Đề bài: Xử lý ảnh dùng NumPy (ảnh = mảng 3D)

Python

1# Giả lập ảnh 100x100 RGB
2np.random.seed(42)
3image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
4print(f"Image shape: {image.shape}")  # (100, 100, 3)
5
6# (a) Grayscale: Y = 0.299R + 0.587G + 0.114B
7weights = np.array([0.299, 0.587, 0.114])
8gray = np.dot(image, weights).astype(np.uint8)
9print(f"Grayscale shape: {gray.shape}")  # (100, 100)
10
11# (b) Brightness: tăng sáng 50 (clip ở 255)
12brighter = np.clip(image.astype(int) + 50, 0, 255).astype(np.uint8)
13
14# (c) Flip: lật ngang
15flipped = image[:, ::-1, :]
16
17# (d) Crop: lấy vùng trung tâm 50x50
18h, w = image.shape[:2]
19crop = image[h//4:3*h//4, w//4:3*w//4, :]
20print(f"Cropped shape: {crop.shape}")  # (50, 50, 3)
21
22# (e) Histogram
23for i, color in enumerate(["Red", "Green", "Blue"]):
24    channel = image[:, :, i]
25    print(f"  {color}: mean={channel.mean():.1f}, std={channel.std():.1f}")

Bài 4.3: Moving Average cho dữ liệu chuỗi thời gian

Đề bài: Tính moving average cho dữ liệu giá cổ phiếu

Python

1def moving_average(data, window):
2    """Tính simple moving average (SMA)"""
3    # Cách 1: cumsum trick
4    cumsum = np.cumsum(data)
5    cumsum = np.insert(cumsum, 0, 0)
6    return (cumsum[window:] - cumsum[:-window]) / window
7
8# Giả lập giá cổ phiếu 100 ngày
9np.random.seed(42)
10price = 100 + np.cumsum(np.random.randn(100) * 2)
11
12# Tính SMA
13sma_7 = moving_average(price, 7)
14sma_20 = moving_average(price, 20)
15
16print(f"Price: {len(price)} days, last 5: {price[-5:].round(2)}")
17print(f"SMA7:  {len(sma_7)} values, last 5: {sma_7[-5:].round(2)}")
18print(f"SMA20: {len(sma_20)} values, last 5: {sma_20[-5:].round(2)}")
19
20# Tín hiệu: SMA7 cắt SMA20 từ dưới lên → Buy signal
21# (Advanced trick dùng boolean indexing)

Monte Carlo simulation là kỹ thuật quan trọng trong finance, ML, và physics. Bạn sẽ gặp lại trong khóa ML!

Checkpoint

Bạn đã giải được ít nhất 1 bài Hard chưa? Bài Monte Carlo Pi là ví dụ kinh điển về sức mạnh NumPy!

Task 4

📝 Tổng Kết

TB5 min

Câu hỏi tự kiểm tra

Khi nào nên dùng np.where() thay vì boolean indexing đơn giản?
Monte Carlo simulation là gì và tại sao NumPy phù hợp để thực hiện nó?
Moving average (trung bình trượt) dùng để làm gì trong phân tích dữ liệu chuỗi thời gian?
Làm thế nào để chuyển ảnh RGB sang grayscale bằng NumPy? Công thức nào được sử dụng?

✅ Checklist hoàn thành

🟢 Phần 1 (Easy): Tạo array, reshape — xong?
🟡 Phần 2 (Medium): Indexing, boolean mask, np.where — xong?
🟡 Phần 3 (Medium): Broadcasting, normalization, distance — xong?
🔴 Phần 4 (Hard): Monte Carlo, image processing, moving average — xong?

NumPy Cheat Sheet

Python

1import numpy as np
2
3# Create
4np.array(), np.zeros(), np.ones(), np.arange(), np.linspace()
5
6# Index
7a[mask], np.where(cond, x, y), a[a > 0]
8
9# Reshape
10a.reshape(m, n), a.ravel(), a.T, np.vstack/hstack()
11
12# Math
13np.mean(), np.std(), np.sum(), np.cumsum()
14np.dot(), a @ b, np.linalg.inv()
15
16# Random
17rng = np.random.default_rng(42)
18rng.random(), rng.normal(), rng.integers()

Bài tiếp theo: Pandas — Thư viện xử lý dữ liệu dạng bảng #1 cho Data Science! 📊

Task 5

🎯 Mục tiêu bài học

🟢 Phần 1: Tạo và Thao tác Array (Easy)

Bài 1.1: Khởi tạo cơ bản

Bài 1.2: Reshape Challenge

Bài 1.3: Array Attributes

Checkpoint

🟡 Phần 2: Indexing và Filtering (Medium)

Bài 2.1: Trích xuất dữ liệu

Bài 2.2: Thay thế và Cập nhật

Bài 2.3: np.where — Conditional Selection

Checkpoint

🟡 Phần 3: Broadcasting và Toán học (Medium)

Bài 3.1: Min-Max Normalization

Bài 3.2: Distance Matrix

Bài 3.3: Outer Product cho phân tích

Checkpoint

🔴 Phần 4: Bài Toán Thực Tế (Hard)

Bài 4.1: Mô phỏng Monte Carlo — Ước tính π

Bài 4.2: Image Processing cơ bản

Bài 4.3: Moving Average cho dữ liệu chuỗi thời gian

Checkpoint

📝 Tổng Kết

Câu hỏi tự kiểm tra

✅ Checklist hoàn thành

NumPy Cheat Sheet

Khóa học

Mentor & Hỗ trợ

Blog

Giới thiệu