Lý thuyết
Bài 2/15

SQL Nâng cao

Window Functions, CTEs, Subqueries và Query Optimization

SQL Nâng cao

Advanced SQL Database

1. Window Functions

1.1 Giới thiệu Window Functions

Window Functions cho phép tính toán trên một "cửa sổ" (window) của rows liên quan đến row hiện tại.

SQL
1-- Syntax cơ bản
2SELECT
3 column1,
4 AGG_FUNCTION(column2) OVER (
5 PARTITION BY column3
6 ORDER BY column4
7 ROWS/RANGE BETWEEN ... AND ...
8 ) as window_result
9FROM table;

1.2 Các loại Window Functions

LoạiFunctionsMục đích
AggregateSUM, AVG, COUNT, MIN, MAXTính toán trên window
RankingROW_NUMBER, RANK, DENSE_RANKXếp hạng
ValueLAG, LEAD, FIRST_VALUE, LAST_VALUETruy cập rows khác

1.3 Ví dụ ROW_NUMBER vs RANK

SQL
1-- Dataset: Sales
2CREATE TABLE sales (
3 employee VARCHAR(50),
4 region VARCHAR(50),
5 revenue DECIMAL(10,2)
6);
7
8INSERT INTO sales VALUES
9('Alice', 'North', 5000),
10('Bob', 'North', 7000),
11('Charlie', 'South', 7000),
12('David', 'South', 6000);
13
14-- ROW_NUMBER: Số thứ tự liên tục
15SELECT
16 employee,
17 region,
18 revenue,
19 ROW_NUMBER() OVER (PARTITION BY region ORDER BY revenue DESC) as row_num
20FROM sales;
21
22/*
23employee | region | revenue | row_num
24---------|--------|---------|--------
25Bob | North | 7000 | 1
26Alice | North | 5000 | 2
27Charlie | South | 7000 | 1
28David | South | 6000 | 2
29*/
30
31-- RANK: Giống nhau thì cùng rank, nhảy rank
32SELECT
33 employee,
34 revenue,
35 RANK() OVER (ORDER BY revenue DESC) as rank
36FROM sales;
37
38/*
39employee | revenue | rank
40---------|---------|-----
41Bob | 7000 | 1
42Charlie | 7000 | 1 -- Cùng rank
43David | 6000 | 3 -- Nhảy qua rank 2
44Alice | 5000 | 4
45*/
46
47-- DENSE_RANK: Không nhảy rank
48SELECT
49 employee,
50 revenue,
51 DENSE_RANK() OVER (ORDER BY revenue DESC) as dense_rank
52FROM sales;
53
54/*
55employee | revenue | dense_rank
56---------|---------|------------
57Bob | 7000 | 1
58Charlie | 7000 | 1
59David | 6000 | 2 -- Không nhảy
60Alice | 5000 | 3
61*/

1.4 LAG và LEAD

SQL
1-- LAG: Lấy giá trị row trước đó
2-- LEAD: Lấy giá trị row sau đó
3
4CREATE TABLE stock_prices (
5 date DATE,
6 price DECIMAL(10,2)
7);
8
9SELECT
10 date,
11 price,
12 LAG(price, 1) OVER (ORDER BY date) as prev_price,
13 LEAD(price, 1) OVER (ORDER BY date) as next_price,
14 price - LAG(price, 1) OVER (ORDER BY date) as daily_change
15FROM stock_prices;
16
17/*
18date | price | prev_price | next_price | daily_change
19-----------|-------|------------|------------|-------------
202024-01-01 | 100 | NULL | 102 | NULL
212024-01-02 | 102 | 100 | 98 | 2
222024-01-03 | 98 | 102 | 105 | -4
232024-01-04 | 105 | 98 | NULL | 7
24*/

2. Common Table Expressions (CTEs)

2.1 Cú pháp WITH

SQL
1-- CTE đơn giản
2WITH high_earners AS (
3 SELECT
4 employee_id,
5 name,
6 salary
7 FROM employees
8 WHERE salary > 100000
9)
10SELECT * FROM high_earners
11WHERE department = 'Engineering';

2.2 Multiple CTEs

SQL
1WITH
2-- CTE 1: Aggregate sales
3monthly_sales AS (
4 SELECT
5 DATE_TRUNC('month', order_date) as month,
6 SUM(amount) as total_sales
7 FROM orders
8 GROUP BY DATE_TRUNC('month', order_date)
9),
10-- CTE 2: Calculate growth
11sales_growth AS (
12 SELECT
13 month,
14 total_sales,
15 LAG(total_sales) OVER (ORDER BY month) as prev_month_sales,
16 (total_sales - LAG(total_sales) OVER (ORDER BY month)) /
17 LAG(total_sales) OVER (ORDER BY month) * 100 as growth_pct
18 FROM monthly_sales
19)
20SELECT * FROM sales_growth
21WHERE growth_pct > 10;

2.3 Recursive CTEs

SQL
1-- Ví dụ: Employee hierarchy
2WITH RECURSIVE emp_hierarchy AS (
3 -- Base case: Top manager
4 SELECT
5 employee_id,
6 name,
7 manager_id,
8 1 as level
9 FROM employees
10 WHERE manager_id IS NULL
11
12 UNION ALL
13
14 -- Recursive case
15 SELECT
16 e.employee_id,
17 e.name,
18 e.manager_id,
19 h.level + 1
20 FROM employees e
21 INNER JOIN emp_hierarchy h ON e.manager_id = h.employee_id
22)
23SELECT * FROM emp_hierarchy
24ORDER BY level, name;

3. Subqueries Nâng cao

3.1 Correlated Subquery

SQL
1-- Tìm employees có salary > average của department họ
2SELECT
3 e.name,
4 e.department,
5 e.salary
6FROM employees e
7WHERE e.salary > (
8 SELECT AVG(salary)
9 FROM employees e2
10 WHERE e2.department = e.department
11);

3.2 EXISTS vs IN

SQL
1-- EXISTS: Kiểm tra tồn tại (nhanh hơn với large datasets)
2SELECT name
3FROM customers c
4WHERE EXISTS (
5 SELECT 1
6 FROM orders o
7 WHERE o.customer_id = c.id
8 AND o.order_date > '2024-01-01'
9);
10
11-- IN: Kiểm tra trong danh sách
12SELECT name
13FROM customers
14WHERE id IN (
15 SELECT DISTINCT customer_id
16 FROM orders
17 WHERE order_date > '2024-01-01'
18);

4. Query Optimization

4.1 EXPLAIN ANALYZE

SQL
1-- Xem execution plan
2EXPLAIN ANALYZE
3SELECT *
4FROM orders o
5JOIN customers c ON o.customer_id = c.id
6WHERE o.order_date > '2024-01-01';

4.2 Indexing

SQL
1-- Tạo index
2CREATE INDEX idx_orders_date ON orders(order_date);
3CREATE INDEX idx_orders_customer ON orders(customer_id);
4
5-- Composite index
6CREATE INDEX idx_orders_date_customer ON orders(order_date, customer_id);

4.3 Best Practices

SQL
1-- ❌ Bad: SELECT *
2SELECT * FROM large_table;
3
4-- ✅ Good: Chỉ lấy columns cần thiết
5SELECT id, name, created_at FROM large_table;
6
7-- ❌ Bad: Function trong WHERE
8SELECT * FROM orders WHERE YEAR(order_date) = 2024;
9
10-- ✅ Good: Sử dụng range
11SELECT * FROM orders
12WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';
13
14-- ❌ Bad: OR với nhiều conditions
15SELECT * FROM products
16WHERE category = 'A' OR category = 'B' OR category = 'C';
17
18-- ✅ Good: IN
19SELECT * FROM products
20WHERE category IN ('A', 'B', 'C');

Bài tập

Bài 1: Window Functions

Tính running total sales theo tháng cho mỗi product category.

Bài 2: CTEs

Tìm top 5 customers có tổng spending cao nhất mỗi quý.

Bài 3: Recursive CTE

Xây dựng org chart từ employee table.

Bài 4: Optimization

Optimize query chậm bằng indexing và rewrite.

Next: Bài 3 - Pandas Nâng cao