MinAI - Về trang chủ
Hướng dẫn
5/133 giờ
Đang tải...

Advanced SQL: Window Functions, CTEs & Optimization

Window Functions, Recursive CTEs, và Query Optimization cho phân tích dữ liệu nâng cao

0

🎯 Mục tiêu bài học

TB5 min
Sau bài học này, bạn sẽ:
  • Sử dụng thành thạo Window Functions (ranking, value, aggregate)
  • Viết Recursive CTEs cho hierarchical data và graph traversal
  • Đọc hiểu EXPLAIN plans và tối ưu query performance
  • Áp dụng indexing strategies và query rewriting techniques
📋 Thông tin bài học
Thông tinChi tiết
⏱️ Thời lượng3 giờ
📖 Chủ đề chínhWindow Functions, CTEs, Query Optimization
💡 Kiến thức cần cóSQL cơ bản (SELECT, JOIN, GROUP BY)
🎯 OutputViết được advanced SQL queries cho business analytics
1

📖 Thuật ngữ quan trọng

TB5 min
Thuật ngữTiếng ViệtMô tả
Window FunctionHàm cửa sổTính toán trên tập rows liên quan đến current row mà không collapse kết quả
PARTITION BYPhân nhómChia data thành partitions cho window function (tương tự GROUP BY)
Frame ClauseKhung dữ liệuXác định rows nào được include trong window calculation
ROW_NUMBERSố thứ tự hàngGán unique sequential number cho mỗi row trong partition
RANK / DENSE_RANKXếp hạngXếp hạng rows — RANK có gap, DENSE_RANK không gap
LAG / LEADGiá trị trước/sauTruy cập value từ row trước (LAG) hoặc sau (LEAD)
CTEBiểu thức bảng chungCommon Table Expression — temporary named result set
Recursive CTECTE đệ quyCTE tự tham chiếu cho hierarchical/sequential data
EXPLAINGiải thích queryHiển thị execution plan của database engine
SargableTìm kiếm với indexQuery có thể tận dụng index (Search ARGument able)

Checkpoint

Window Functions cho phép tính toán across rows mà giữ nguyên all rows (khác GROUP BY collapse rows). CTEs giúp organize complex queries, recursive CTEs xử lý hierarchical data. EXPLAIN giúp đọc execution plans để optimize. Bạn đã từng gặp trường hợp cần tính rank hoặc running total trong SQL chưa? Window Functions sẽ giải quyết những bài toán đó!

2

🏆 Ranking Functions

TB5 min
Window Functions là gì?

Window Functions thực hiện calculations across rows liên quan đến current row, mà không collapse kết quả thành 1 row như GROUP BY. Đây là một trong những features mạnh nhất của SQL.

Window Function Syntax

Window Function Syntax

Cú pháp: function_name(column) OVER (PARTITION BY ... ORDER BY ... frame_clause)

Thành phầnMô tả
PARTITION BYDefine groups (like GROUP BY)
ORDER BYDefine row order within partition
frame_clauseDefine which rows to include

ROW_NUMBER()

SQL
1-- Unique sequential number for each row
2SELECT
3 employee_id, name, department, salary,
4 ROW_NUMBER() OVER (ORDER BY salary DESC) as salary_rank
5FROM employees;
6
7-- Row number within each department
8SELECT
9 employee_id, name, department, salary,
10 ROW_NUMBER() OVER (
11 PARTITION BY department
12 ORDER BY salary DESC
13 ) as dept_rank
14FROM employees;
15
16-- Use Case: Get top N per group
17WITH ranked AS (
18 SELECT
19 *,
20 ROW_NUMBER() OVER (
21 PARTITION BY department
22 ORDER BY salary DESC
23 ) as rn
24 FROM employees
25)
26SELECT * FROM ranked WHERE rn <= 3; -- Top 3 per department

RANK() và DENSE_RANK()

SQL
1-- RANK: Ties get same rank, skip numbers
2-- DENSE_RANK: Ties get same rank, no gaps
3SELECT
4 name, department, salary,
5 ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num,
6 RANK() OVER (ORDER BY salary DESC) as rank,
7 DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank
8FROM employees;
9
10-- Example output:
11-- | name | salary | row_num | rank | dense_rank |
12-- |---------|--------|---------|------|------------|
13-- | Alice | 100000 | 1 | 1 | 1 |
14-- | Bob | 100000 | 2 | 1 | 1 | <- tie
15-- | Charlie | 90000 | 3 | 3 | 2 | <- rank skips to 3
16-- | Diana | 80000 | 4 | 4 | 3 |

NTILE() và PERCENT_RANK()

SQL
1-- Divide rows into N buckets
2SELECT
3 name, salary,
4 NTILE(4) OVER (ORDER BY salary DESC) as salary_quartile
5FROM employees;
6
7-- PERCENT_RANK: Relative rank (0-1)
8-- CUME_DIST: Cumulative distribution
9SELECT
10 name, salary,
11 RANK() OVER (ORDER BY salary) as rank,
12 PERCENT_RANK() OVER (ORDER BY salary) as pct_rank,
13 CUME_DIST() OVER (ORDER BY salary) as cume_dist
14FROM employees;

Checkpoint

ROW_NUMBER gán số unique, RANK/DENSE_RANK xử lý ties khác nhau, NTILE chia buckets, PERCENT_RANK cho relative position. ROW_NUMBER vs RANK vs DENSE_RANK — bạn thấy sự khác biệt khi xử lý tied values không? Khi nào dùng cái nào?

3

📊 Value & Aggregate Window Functions

TB5 min

LAG() và LEAD()

SQL
1-- LAG: Access previous row | LEAD: Access next row
2SELECT
3 date, sales,
4 LAG(sales, 1) OVER (ORDER BY date) as prev_day_sales,
5 LEAD(sales, 1) OVER (ORDER BY date) as next_day_sales,
6 sales - LAG(sales, 1) OVER (ORDER BY date) as daily_change
7FROM daily_sales;
8
9-- Multiple periods
10SELECT
11 date, sales,
12 LAG(sales, 1) OVER (ORDER BY date) as prev_1,
13 LAG(sales, 7) OVER (ORDER BY date) as prev_7, -- Week ago
14 LAG(sales, 30) OVER (ORDER BY date) as prev_30 -- Month ago
15FROM daily_sales;
16
17-- With default value for NULL
18SELECT
19 date, sales,
20 LAG(sales, 1, 0) OVER (ORDER BY date) as prev_sales
21FROM daily_sales;

FIRST_VALUE() và LAST_VALUE()

SQL
1SELECT
2 employee_id, department, salary,
3 FIRST_VALUE(salary) OVER (
4 PARTITION BY department
5 ORDER BY salary DESC
6 ) as highest_in_dept,
7 LAST_VALUE(salary) OVER (
8 PARTITION BY department
9 ORDER BY salary DESC
10 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
11 ) as lowest_in_dept
12FROM employees;
13-- Note: LAST_VALUE requires frame specification to work correctly!

Running Totals & Moving Averages

SQL
1-- Cumulative sum
2SELECT
3 date, amount,
4 SUM(amount) OVER (ORDER BY date) as running_total
5FROM transactions;
6
7-- Year-to-date (reset each year)
8SELECT
9 date, amount,
10 SUM(amount) OVER (
11 PARTITION BY YEAR(date)
12 ORDER BY date
13 ) as ytd_total
14FROM transactions;
15
16-- 7-day moving average
17SELECT
18 date, sales,
19 AVG(sales) OVER (
20 ORDER BY date
21 ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
22 ) as ma_7
23FROM daily_sales;
24
25-- 30-day moving average
26SELECT
27 date, sales,
28 AVG(sales) OVER (
29 ORDER BY date
30 ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
31 ) as ma_30
32FROM daily_sales;

Frame Clauses

Frame Specification

Frame clause xác định rows nào được include trong calculation. Default là RANGE UNBOUNDED PRECEDING khi có ORDER BY, cần explicit frame cho control tốt hơn.

SQL
1-- ROWS: Physical row count
2-- RANGE: Logical value range
3
4-- Frame boundary options:
5-- UNBOUNDED PRECEDING → From partition start
6-- n PRECEDING → n rows before current
7-- CURRENT ROW → Current row
8-- n FOLLOWING → n rows after current
9-- UNBOUNDED FOLLOWING → To partition end
10
11-- Named Windows: Define once, use multiple times
12SELECT
13 date, sales,
14 SUM(sales) OVER sales_window as running_total,
15 AVG(sales) OVER sales_window as running_avg,
16 COUNT(*) OVER sales_window as row_count
17FROM daily_sales
18WINDOW sales_window AS (
19 ORDER BY date
20 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
21);

Checkpoint

LAG/LEAD truy cập rows trước/sau, FIRST_VALUE/LAST_VALUE lấy biên giá trị. Running totals dùng SUM OVER, moving averages dùng AVG với ROWS frame. Bạn có thể nghĩ ra business case nào dùng LAG để so sánh month-over-month không?

4

💼 Window Functions — Ứng dụng thực tế

TB5 min

Year-over-Year Analysis

SQL
1WITH monthly_sales AS (
2 SELECT
3 DATE_TRUNC('month', date) as month,
4 SUM(amount) as sales
5 FROM transactions
6 GROUP BY DATE_TRUNC('month', date)
7)
8SELECT
9 month, sales,
10 LAG(sales, 12) OVER (ORDER BY month) as prev_year_sales,
11 ROUND(
12 (sales - LAG(sales, 12) OVER (ORDER BY month))
13 / LAG(sales, 12) OVER (ORDER BY month) * 100, 2
14 ) as yoy_growth_pct
15FROM monthly_sales
16ORDER BY month;

Session Analysis

SQL
1-- Identify user sessions (30 min gap = new session)
2WITH activity_with_gaps AS (
3 SELECT
4 user_id, timestamp,
5 CASE
6 WHEN timestamp - LAG(timestamp) OVER (
7 PARTITION BY user_id ORDER BY timestamp
8 ) > INTERVAL '30 minutes'
9 THEN 1 ELSE 0
10 END as new_session
11 FROM user_activity
12),
13sessions AS (
14 SELECT *,
15 SUM(new_session) OVER (
16 PARTITION BY user_id ORDER BY timestamp
17 ) + 1 as session_id
18 FROM activity_with_gaps
19)
20SELECT
21 user_id, session_id,
22 MIN(timestamp) as session_start,
23 MAX(timestamp) as session_end,
24 COUNT(*) as events_in_session
25FROM sessions
26GROUP BY user_id, session_id;

Gaps and Islands

SQL
1-- Identify consecutive sequences
2WITH numbered AS (
3 SELECT
4 date, status,
5 ROW_NUMBER() OVER (ORDER BY date) -
6 ROW_NUMBER() OVER (PARTITION BY status ORDER BY date) as grp
7 FROM events
8)
9SELECT
10 status,
11 MIN(date) as start_date,
12 MAX(date) as end_date,
13 COUNT(*) as consecutive_days
14FROM numbered
15GROUP BY status, grp
16ORDER BY start_date;

Percentile Analysis

SQL
1SELECT
2 department, name, salary,
3 PERCENT_RANK() OVER (ORDER BY salary) as overall_percentile,
4 PERCENT_RANK() OVER (
5 PARTITION BY department ORDER BY salary
6 ) as dept_percentile,
7 CASE
8 WHEN PERCENT_RANK() OVER (ORDER BY salary) >= 0.9 THEN 'Top 10%'
9 WHEN PERCENT_RANK() OVER (ORDER BY salary) >= 0.75 THEN 'Top 25%'
10 WHEN PERCENT_RANK() OVER (ORDER BY salary) >= 0.5 THEN 'Top 50%'
11 ELSE 'Bottom 50%'
12 END as percentile_group
13FROM employees;
Window Function Performance

Window functions có thể expensive cho large datasets. Optimize với proper indexing trên PARTITION BY và ORDER BY columns.

SQL
1-- Create index for window function optimization
2CREATE INDEX idx_sales_date ON sales(date);
3CREATE INDEX idx_sales_category_date ON sales(category, date);
4CREATE INDEX idx_transactions_user_date ON transactions(user_id, date);

Checkpoint

Window Functions giải quyết nhiều bài toán analytics thực tế: YoY growth, session analysis, gaps & islands, percentile ranking. Gaps & Islands là pattern rất powerful — bạn có thể áp dụng nó để detect consecutive active days của users không?

5

🔄 CTEs & Recursive CTEs

TB5 min
Common Table Expressions

CTEs (WITH clause) tạo temporary named result sets, giúp organize complex queries, improve readability, và enable recursive operations.

Basic CTE — Multiple CTEs

SQL
1WITH
2monthly_sales AS (
3 SELECT
4 DATE_TRUNC('month', order_date) as month,
5 SUM(amount) as sales
6 FROM orders
7 GROUP BY DATE_TRUNC('month', order_date)
8),
9monthly_targets AS (
10 SELECT month, target_amount FROM targets
11)
12SELECT
13 m.month, m.sales, t.target_amount,
14 m.sales - t.target_amount as variance
15FROM monthly_sales m
16JOIN monthly_targets t ON m.month = t.month;

Recursive CTE — Hierarchy Traversal

Recursive CTEs

Recursive CTEs reference themselves để iterate qua hierarchical hoặc sequential data. Cần base case và recursive case.

SQL
1-- Employee - Manager hierarchy
2WITH RECURSIVE org_chart AS (
3 -- Base case: Top-level managers
4 SELECT
5 id, name, manager_id,
6 1 as level,
7 name as path
8 FROM employees
9 WHERE manager_id IS NULL
10
11 UNION ALL
12
13 -- Recursive case
14 SELECT
15 e.id, e.name, e.manager_id,
16 oc.level + 1,
17 oc.path || ' > ' || e.name
18 FROM employees e
19 JOIN org_chart oc ON e.manager_id = oc.id
20)
21SELECT * FROM org_chart ORDER BY path;

Recursive CTE — Tree & BOM

SQL
1-- Categories with subcategories
2WITH RECURSIVE category_tree AS (
3 SELECT
4 id, name, parent_id, 0 as depth,
5 ARRAY[id] as path_ids,
6 name as full_path
7 FROM categories
8 WHERE parent_id IS NULL
9
10 UNION ALL
11
12 SELECT
13 c.id, c.name, c.parent_id,
14 ct.depth + 1,
15 ct.path_ids || c.id,
16 ct.full_path || ' > ' || c.name
17 FROM categories c
18 JOIN category_tree ct ON c.parent_id = ct.id
19)
20SELECT
21 id,
22 REPEAT(' ', depth) || name as indented_name,
23 full_path
24FROM category_tree
25ORDER BY path_ids;

Number/Date Series Generation

SQL
1-- Generate date range
2WITH RECURSIVE dates AS (
3 SELECT DATE '2024-01-01' as date
4 UNION ALL
5 SELECT date + INTERVAL '1 day'
6 FROM dates
7 WHERE date < DATE '2024-12-31'
8)
9SELECT date FROM dates;
10
11-- Fill missing dates in time series
12WITH RECURSIVE all_dates AS (
13 SELECT MIN(date) as date FROM sales
14 UNION ALL
15 SELECT date + INTERVAL '1 day'
16 FROM all_dates
17 WHERE date < (SELECT MAX(date) FROM sales)
18),
19daily_sales AS (
20 SELECT date, SUM(amount) as sales FROM sales GROUP BY date
21)
22SELECT
23 ad.date,
24 COALESCE(ds.sales, 0) as sales
25FROM all_dates ad
26LEFT JOIN daily_sales ds ON ad.date = ds.date
27ORDER BY ad.date;

Graph Traversal

SQL
1-- Find all paths from A to B
2WITH RECURSIVE paths AS (
3 SELECT
4 from_node, to_node, weight,
5 ARRAY[from_node, to_node] as path,
6 weight as total_weight
7 FROM edges
8 WHERE from_node = 'A'
9
10 UNION ALL
11
12 SELECT
13 p.from_node, e.to_node, e.weight,
14 p.path || e.to_node,
15 p.total_weight + e.weight
16 FROM paths p
17 JOIN edges e ON p.to_node = e.from_node
18 WHERE NOT e.to_node = ANY(p.path) -- Avoid cycles
19)
20SELECT * FROM paths
21WHERE to_node = 'Z'
22ORDER BY total_weight;

Complex CTE Patterns

SQL
1-- CLV Analysis with multiple CTEs
2WITH
3first_purchase AS (
4 SELECT customer_id, MIN(order_date) as first_order_date
5 FROM orders GROUP BY customer_id
6),
7customer_metrics AS (
8 SELECT
9 o.customer_id,
10 fp.first_order_date,
11 COUNT(DISTINCT o.order_id) as total_orders,
12 SUM(o.amount) as total_revenue,
13 COUNT(DISTINCT DATE_TRUNC('month', o.order_date)) as active_months
14 FROM orders o
15 JOIN first_purchase fp ON o.customer_id = fp.customer_id
16 GROUP BY o.customer_id, fp.first_order_date
17),
18cohort_analysis AS (
19 SELECT
20 customer_id,
21 DATE_TRUNC('month', first_order_date) as cohort_month,
22 total_orders, total_revenue,
23 total_revenue / GREATEST(active_months, 1) as monthly_value
24 FROM customer_metrics
25)
26SELECT
27 cohort_month,
28 COUNT(*) as customers,
29 ROUND(AVG(total_orders), 1) as avg_orders,
30 ROUND(AVG(total_revenue), 2) as avg_revenue,
31 ROUND(AVG(monthly_value), 2) as avg_monthly_value
32FROM cohort_analysis
33GROUP BY cohort_month
34ORDER BY cohort_month;

Checkpoint

Basic CTEs organize complex queries. Recursive CTEs xử lý hierarchical data (org charts, trees, BOM), generate series, và traverse graphs. Luôn cần base case + termination condition. Recursive CTEs rất hữu ích cho org charts — bạn có thể nghĩ ra cách dùng nó cho product categories trên e-commerce không?

6

🔍 Query Optimization — EXPLAIN & Indexing

TB5 min
Tại sao Optimization quan trọng?

Query chậm = User experience kém, server costs cao, bottlenecks trong pipelines. Query optimization là kỹ năng essential cho mọi Data professional.

Query Execution Flow

Query Execution Pipeline

📝SQL Query
🔍Parser
Optimizer (Cost-based)
▶️Executor
📊Results

EXPLAIN Command

SQL
1-- PostgreSQL
2EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;
3
4/*
5Seq Scan on orders (cost=0.00..1520.00 rows=50 width=120)
6 (actual time=0.015..12.450 rows=48 loops=1)
7 Filter: (customer_id = 100)
8 Rows Removed by Filter: 49952
9Planning Time: 0.085 ms
10Execution Time: 12.480 ms
11*/
12
13-- Key metrics:
14-- cost: Estimated cost (startup..total)
15-- rows: Estimated rows returned
16-- actual time: Real execution time (ms)
17-- Rows Removed by Filter: Wasted work!

Scan Types

SQL
1-- Sequential Scan: Reads entire table (slowest)
2Seq Scan on orders
3
4-- Index Scan: Uses index to find rows (fast)
5Index Scan using idx_customer_id on orders
6
7-- Index Only Scan: All data from index (fastest)
8Index Only Scan using idx_customer_date on orders
9
10-- Bitmap Scan: For multiple conditions
11Bitmap Heap Scan on orders
12 -> Bitmap Index Scan on idx_customer_id

Indexing Strategies

SQL
1-- Single column index
2CREATE INDEX idx_customer_id ON orders(customer_id);
3
4-- Composite index (column order matters!)
5CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
6
7-- Partial index (PostgreSQL)
8CREATE INDEX idx_active_orders ON orders(order_date)
9WHERE status = 'active';
10
11-- Covering index
12CREATE INDEX idx_covering ON orders(customer_id)
13INCLUDE (order_date, amount);

Composite Index — Leftmost Prefix Rule

SQL
1CREATE INDEX idx_a_b_c ON table(a, b, c);
2
3-- This index supports:
4WHERE a = ? -- ✅ Uses index
5WHERE a = ? AND b = ? -- ✅ Uses index
6WHERE a = ? AND b = ? AND c = ? -- ✅ Uses index
7WHERE b = ? AND c = ? -- ❌ Cannot use index (missing 'a')
8WHERE a = ? AND c = ? -- ⚠️ Partial use (only 'a')
When to Create Index
Nên tạo Index ✅Không nên ❌
Primary keysSmall tables (dưới 1000 rows)
Foreign keysLow cardinality columns
WHERE/JOIN columnsFrequently updated columns
ORDER BY columnsEvery column
High-cardinality columns

Checkpoint

EXPLAIN ANALYZE cho biết execution plan thực tế. Index Scan >> Seq Scan cho large tables. Composite index tuân theo leftmost prefix rule. Covering index cho Index Only Scan. Bạn có biết tại sao column order trong composite index lại quan trọng không?

7

⚡ Query Optimization — Kỹ thuật tối ưu

TB5 min

Sargable Queries

Sargable Queries

SARG = Search ARGument. Sargable queries can use indexes. Non-sargable queries force full table scans.

SQL
1-- ❌ Non-sargable (can't use index)
2SELECT * FROM orders WHERE YEAR(order_date) = 2024;
3SELECT * FROM users WHERE UPPER(email) = 'TEST@EMAIL.COM';
4SELECT * FROM products WHERE price + 10 > 100;
5
6-- ✅ Sargable (can use index)
7SELECT * FROM orders
8WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';
9SELECT * FROM users WHERE email = 'test@email.com';
10SELECT * FROM products WHERE price > 90;

EXISTS vs IN

SQL
1-- For large subqueries, EXISTS often faster
2-- ❌ IN with large result set
3SELECT * FROM orders
4WHERE customer_id IN (SELECT id FROM customers WHERE region = 'North');
5
6-- ✅ EXISTS (stops at first match)
7SELECT * FROM orders o
8WHERE EXISTS (
9 SELECT 1 FROM customers c
10 WHERE c.id = o.customer_id AND c.region = 'North'
11);

JOIN Optimization

SQL
1-- Filter early (before join)
2SELECT o.*, c.name
3FROM orders o
4JOIN customers c ON o.customer_id = c.id
5WHERE o.order_date >= '2024-01-01' -- Applied before join
6AND c.status = 'active';
7
8-- Avoid unnecessary joins
9-- ❌ If you don't need customer data
10SELECT o.* FROM orders o JOIN customers c ON o.customer_id = c.id;
11-- ✅ Skip the join
12SELECT * FROM orders;

Efficient Pagination

SQL
1-- ❌ OFFSET becomes slow for large offsets
2SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 10000;
3
4-- ✅ Keyset pagination (use last seen value)
5SELECT * FROM orders
6WHERE id > 12345 -- Last ID from previous page
7ORDER BY id LIMIT 20;

Query Rewriting

SQL
1-- Subquery → JOIN
2-- ❌ Correlated subquery (runs for each row)
3SELECT o.*,
4 (SELECT name FROM customers WHERE id = o.customer_id) as customer_name
5FROM orders o;
6
7-- ✅ JOIN (single operation)
8SELECT o.*, c.name as customer_name
9FROM orders o
10LEFT JOIN customers c ON o.customer_id = c.id;
11
12-- UNION ALL vs UNION
13-- UNION removes duplicates (requires sort) — use UNION ALL when possible

Partitioning

SQL
1-- Range partitioning (PostgreSQL)
2CREATE TABLE orders (
3 order_id SERIAL,
4 order_date DATE,
5 amount DECIMAL
6) PARTITION BY RANGE (order_date);
7
8CREATE TABLE orders_2023 PARTITION OF orders
9 FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
10CREATE TABLE orders_2024 PARTITION OF orders
11 FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
12
13-- Queries automatically use only relevant partition
14SELECT * FROM orders WHERE order_date = '2024-06-15';
15-- Only scans orders_2024

Common Anti-Patterns

SQL
1-- N+1 Query Problem
2-- ❌ Loop with individual queries
3-- FOR each customer: SELECT * FROM orders WHERE customer_id = ?;
4-- ✅ Single query with JOIN
5SELECT c.*, o.* FROM customers c LEFT JOIN orders o ON c.id = o.customer_id;
6
7-- Implicit Type Conversion
8-- ❌ String to number (can't use index)
9SELECT * FROM orders WHERE order_id = '12345'; -- order_id is INT
10-- ✅ Correct type
11SELECT * FROM orders WHERE order_id = 12345;

Optimization Checklist

TechniqueImpactWhen to Use
IndexingHighFrequent WHERE/JOIN columns
EXPLAINEssentialAlways before optimization
Sargable queriesHighAll queries with functions
**Avoid SELECT ***MediumAlways
JOIN optimizationHighMulti-table queries
PartitioningHighVery large tables (>100M rows)

Checkpoint

Sargable queries tận dụng được index. EXISTS thường nhanh hơn IN cho large subqueries. Keyset pagination hiệu quả hơn OFFSET. Tránh N+1 problem, implicit type conversion. Optimization checklist: EXPLAIN → Check scan types → Verify index usage → Make sargable → Optimize JOINs. Bạn sẽ bắt đầu từ bước nào?

8

📋 Tổng kết

TB5 min

Kiến thức đã học

Chủ đềNội dung chính
Ranking FunctionsROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK
Value FunctionsLAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE
Aggregate WindowsRunning totals, Moving averages, Frame clauses, Named windows
ApplicationsYoY analysis, Session detection, Gaps & Islands, Percentiles
Basic CTEsMulti-CTE organization, CTE vs subquery, Materialization
Recursive CTEsHierarchy traversal, Tree/BOM, Date series, Graph paths
EXPLAINExecution plans, Scan types, Join types, Cost analysis
IndexingSingle/Composite/Partial/Covering indexes, Leftmost prefix rule
OptimizationSargable queries, EXISTS vs IN, Pagination, Partitioning

Best Practices

  1. ✅ PARTITION BY = GROUP BY without collapsing rows
  2. ✅ Always specify frame clause for aggregate window functions
  3. ✅ Use Named Windows khi dùng cùng window nhiều lần
  4. ✅ Recursive CTEs cần base case + termination condition
  5. ✅ Run EXPLAIN ANALYZE trước khi optimize
  6. ✅ Make queries sargable — không wrap indexed columns trong functions

Câu hỏi tự kiểm tra

  1. ROW_NUMBER, RANK, DENSE_RANK khác nhau thế nào?
  2. CTE và subquery: khi nào dùng cái nào?
  3. Recursive CTE dùng để giải quyết bài toán gì?
  4. EXPLAIN ANALYZE cho biết thông tin gì?

Bài tiếp theo: SQL + Python Integration →

🎉 Tuyệt vời! Bạn đã master Advanced SQL!

Nhớ: Window Functions + CTEs + Indexes = SQL Pro. Luôn chạy EXPLAIN trước khi deploy query lên production!