🎯 Mục tiêu bài học
- Sử dụng thành thạo Window Functions (ranking, value, aggregate)
- Viết Recursive CTEs cho hierarchical data và graph traversal
- Đọc hiểu EXPLAIN plans và tối ưu query performance
- Áp dụng indexing strategies và query rewriting techniques
| Thông tin | Chi tiết |
|---|---|
| ⏱️ Thời lượng | 3 giờ |
| 📖 Chủ đề chính | Window Functions, CTEs, Query Optimization |
| 💡 Kiến thức cần có | SQL cơ bản (SELECT, JOIN, GROUP BY) |
| 🎯 Output | Viết được advanced SQL queries cho business analytics |
📖 Thuật ngữ quan trọng
| Thuật ngữ | Tiếng Việt | Mô tả |
|---|---|---|
| Window Function | Hàm cửa sổ | Tính toán trên tập rows liên quan đến current row mà không collapse kết quả |
| PARTITION BY | Phân nhóm | Chia data thành partitions cho window function (tương tự GROUP BY) |
| Frame Clause | Khung dữ liệu | Xác định rows nào được include trong window calculation |
| ROW_NUMBER | Số thứ tự hàng | Gán unique sequential number cho mỗi row trong partition |
| RANK / DENSE_RANK | Xếp hạng | Xếp hạng rows — RANK có gap, DENSE_RANK không gap |
| LAG / LEAD | Giá trị trước/sau | Truy cập value từ row trước (LAG) hoặc sau (LEAD) |
| CTE | Biểu thức bảng chung | Common Table Expression — temporary named result set |
| Recursive CTE | CTE đệ quy | CTE tự tham chiếu cho hierarchical/sequential data |
| EXPLAIN | Giải thích query | Hiển thị execution plan của database engine |
| Sargable | Tìm kiếm với index | Query có thể tận dụng index (Search ARGument able) |
Checkpoint
Window Functions cho phép tính toán across rows mà giữ nguyên all rows (khác GROUP BY collapse rows). CTEs giúp organize complex queries, recursive CTEs xử lý hierarchical data. EXPLAIN giúp đọc execution plans để optimize. Bạn đã từng gặp trường hợp cần tính rank hoặc running total trong SQL chưa? Window Functions sẽ giải quyết những bài toán đó!
🏆 Ranking Functions
Window Functions thực hiện calculations across rows liên quan đến current row, mà không collapse kết quả thành 1 row như GROUP BY. Đây là một trong những features mạnh nhất của SQL.
Window Function Syntax
Cú pháp: function_name(column) OVER (PARTITION BY ... ORDER BY ... frame_clause)
| Thành phần | Mô tả |
|---|---|
| PARTITION BY | Define groups (like GROUP BY) |
| ORDER BY | Define row order within partition |
| frame_clause | Define which rows to include |
ROW_NUMBER()
1-- Unique sequential number for each row2SELECT 3 employee_id, name, department, salary,4 ROW_NUMBER() OVER (ORDER BY salary DESC) as salary_rank5FROM employees;67-- Row number within each department8SELECT 9 employee_id, name, department, salary,10 ROW_NUMBER() OVER (11 PARTITION BY department 12 ORDER BY salary DESC13 ) as dept_rank14FROM employees;1516-- Use Case: Get top N per group17WITH ranked AS (18 SELECT 19 *,20 ROW_NUMBER() OVER (21 PARTITION BY department 22 ORDER BY salary DESC23 ) as rn24 FROM employees25)26SELECT * FROM ranked WHERE rn <= 3; -- Top 3 per departmentRANK() và DENSE_RANK()
1-- RANK: Ties get same rank, skip numbers2-- DENSE_RANK: Ties get same rank, no gaps3SELECT 4 name, department, salary,5 ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num,6 RANK() OVER (ORDER BY salary DESC) as rank,7 DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank8FROM employees;910-- Example output:11-- | name | salary | row_num | rank | dense_rank |12-- |---------|--------|---------|------|------------|13-- | Alice | 100000 | 1 | 1 | 1 |14-- | Bob | 100000 | 2 | 1 | 1 | <- tie15-- | Charlie | 90000 | 3 | 3 | 2 | <- rank skips to 316-- | Diana | 80000 | 4 | 4 | 3 |NTILE() và PERCENT_RANK()
1-- Divide rows into N buckets2SELECT 3 name, salary,4 NTILE(4) OVER (ORDER BY salary DESC) as salary_quartile5FROM employees;67-- PERCENT_RANK: Relative rank (0-1)8-- CUME_DIST: Cumulative distribution9SELECT 10 name, salary,11 RANK() OVER (ORDER BY salary) as rank,12 PERCENT_RANK() OVER (ORDER BY salary) as pct_rank,13 CUME_DIST() OVER (ORDER BY salary) as cume_dist14FROM employees;Checkpoint
ROW_NUMBER gán số unique, RANK/DENSE_RANK xử lý ties khác nhau, NTILE chia buckets, PERCENT_RANK cho relative position. ROW_NUMBER vs RANK vs DENSE_RANK — bạn thấy sự khác biệt khi xử lý tied values không? Khi nào dùng cái nào?
📊 Value & Aggregate Window Functions
LAG() và LEAD()
1-- LAG: Access previous row | LEAD: Access next row2SELECT 3 date, sales,4 LAG(sales, 1) OVER (ORDER BY date) as prev_day_sales,5 LEAD(sales, 1) OVER (ORDER BY date) as next_day_sales,6 sales - LAG(sales, 1) OVER (ORDER BY date) as daily_change7FROM daily_sales;89-- Multiple periods10SELECT 11 date, sales,12 LAG(sales, 1) OVER (ORDER BY date) as prev_1,13 LAG(sales, 7) OVER (ORDER BY date) as prev_7, -- Week ago14 LAG(sales, 30) OVER (ORDER BY date) as prev_30 -- Month ago15FROM daily_sales;1617-- With default value for NULL18SELECT 19 date, sales,20 LAG(sales, 1, 0) OVER (ORDER BY date) as prev_sales21FROM daily_sales;FIRST_VALUE() và LAST_VALUE()
1SELECT 2 employee_id, department, salary,3 FIRST_VALUE(salary) OVER (4 PARTITION BY department 5 ORDER BY salary DESC6 ) as highest_in_dept,7 LAST_VALUE(salary) OVER (8 PARTITION BY department 9 ORDER BY salary DESC10 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING11 ) as lowest_in_dept12FROM employees;13-- Note: LAST_VALUE requires frame specification to work correctly!Running Totals & Moving Averages
1-- Cumulative sum2SELECT 3 date, amount,4 SUM(amount) OVER (ORDER BY date) as running_total5FROM transactions;67-- Year-to-date (reset each year)8SELECT 9 date, amount,10 SUM(amount) OVER (11 PARTITION BY YEAR(date) 12 ORDER BY date13 ) as ytd_total14FROM transactions;1516-- 7-day moving average17SELECT 18 date, sales,19 AVG(sales) OVER (20 ORDER BY date 21 ROWS BETWEEN 6 PRECEDING AND CURRENT ROW22 ) as ma_723FROM daily_sales;2425-- 30-day moving average26SELECT 27 date, sales,28 AVG(sales) OVER (29 ORDER BY date 30 ROWS BETWEEN 29 PRECEDING AND CURRENT ROW31 ) as ma_3032FROM daily_sales;Frame Clauses
Frame clause xác định rows nào được include trong calculation. Default là RANGE UNBOUNDED PRECEDING khi có ORDER BY, cần explicit frame cho control tốt hơn.
1-- ROWS: Physical row count2-- RANGE: Logical value range34-- Frame boundary options:5-- UNBOUNDED PRECEDING → From partition start6-- n PRECEDING → n rows before current7-- CURRENT ROW → Current row8-- n FOLLOWING → n rows after current9-- UNBOUNDED FOLLOWING → To partition end1011-- Named Windows: Define once, use multiple times12SELECT 13 date, sales,14 SUM(sales) OVER sales_window as running_total,15 AVG(sales) OVER sales_window as running_avg,16 COUNT(*) OVER sales_window as row_count17FROM daily_sales18WINDOW sales_window AS (19 ORDER BY date 20 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW21);Checkpoint
LAG/LEAD truy cập rows trước/sau, FIRST_VALUE/LAST_VALUE lấy biên giá trị. Running totals dùng SUM OVER, moving averages dùng AVG với ROWS frame. Bạn có thể nghĩ ra business case nào dùng LAG để so sánh month-over-month không?
💼 Window Functions — Ứng dụng thực tế
Year-over-Year Analysis
1WITH monthly_sales AS (2 SELECT 3 DATE_TRUNC('month', date) as month,4 SUM(amount) as sales5 FROM transactions6 GROUP BY DATE_TRUNC('month', date)7)8SELECT 9 month, sales,10 LAG(sales, 12) OVER (ORDER BY month) as prev_year_sales,11 ROUND(12 (sales - LAG(sales, 12) OVER (ORDER BY month)) 13 / LAG(sales, 12) OVER (ORDER BY month) * 100, 214 ) as yoy_growth_pct15FROM monthly_sales16ORDER BY month;Session Analysis
1-- Identify user sessions (30 min gap = new session)2WITH activity_with_gaps AS (3 SELECT 4 user_id, timestamp,5 CASE 6 WHEN timestamp - LAG(timestamp) OVER (7 PARTITION BY user_id ORDER BY timestamp8 ) > INTERVAL '30 minutes'9 THEN 1 ELSE 0 10 END as new_session11 FROM user_activity12),13sessions AS (14 SELECT *,15 SUM(new_session) OVER (16 PARTITION BY user_id ORDER BY timestamp17 ) + 1 as session_id18 FROM activity_with_gaps19)20SELECT 21 user_id, session_id,22 MIN(timestamp) as session_start,23 MAX(timestamp) as session_end,24 COUNT(*) as events_in_session25FROM sessions26GROUP BY user_id, session_id;Gaps and Islands
1-- Identify consecutive sequences2WITH numbered AS (3 SELECT 4 date, status,5 ROW_NUMBER() OVER (ORDER BY date) -6 ROW_NUMBER() OVER (PARTITION BY status ORDER BY date) as grp7 FROM events8)9SELECT 10 status,11 MIN(date) as start_date,12 MAX(date) as end_date,13 COUNT(*) as consecutive_days14FROM numbered15GROUP BY status, grp16ORDER BY start_date;Percentile Analysis
1SELECT 2 department, name, salary,3 PERCENT_RANK() OVER (ORDER BY salary) as overall_percentile,4 PERCENT_RANK() OVER (5 PARTITION BY department ORDER BY salary6 ) as dept_percentile,7 CASE 8 WHEN PERCENT_RANK() OVER (ORDER BY salary) >= 0.9 THEN 'Top 10%'9 WHEN PERCENT_RANK() OVER (ORDER BY salary) >= 0.75 THEN 'Top 25%'10 WHEN PERCENT_RANK() OVER (ORDER BY salary) >= 0.5 THEN 'Top 50%'11 ELSE 'Bottom 50%'12 END as percentile_group13FROM employees;Window functions có thể expensive cho large datasets. Optimize với proper indexing trên PARTITION BY và ORDER BY columns.
1-- Create index for window function optimization2CREATE INDEX idx_sales_date ON sales(date);3CREATE INDEX idx_sales_category_date ON sales(category, date);4CREATE INDEX idx_transactions_user_date ON transactions(user_id, date);Checkpoint
Window Functions giải quyết nhiều bài toán analytics thực tế: YoY growth, session analysis, gaps & islands, percentile ranking. Gaps & Islands là pattern rất powerful — bạn có thể áp dụng nó để detect consecutive active days của users không?
🔄 CTEs & Recursive CTEs
CTEs (WITH clause) tạo temporary named result sets, giúp organize complex queries, improve readability, và enable recursive operations.
Basic CTE — Multiple CTEs
1WITH 2monthly_sales AS (3 SELECT 4 DATE_TRUNC('month', order_date) as month,5 SUM(amount) as sales6 FROM orders7 GROUP BY DATE_TRUNC('month', order_date)8),9monthly_targets AS (10 SELECT month, target_amount FROM targets11)12SELECT 13 m.month, m.sales, t.target_amount,14 m.sales - t.target_amount as variance15FROM monthly_sales m16JOIN monthly_targets t ON m.month = t.month;Recursive CTE — Hierarchy Traversal
Recursive CTEs reference themselves để iterate qua hierarchical hoặc sequential data. Cần base case và recursive case.
1-- Employee - Manager hierarchy2WITH RECURSIVE org_chart AS (3 -- Base case: Top-level managers4 SELECT 5 id, name, manager_id,6 1 as level,7 name as path8 FROM employees9 WHERE manager_id IS NULL10 11 UNION ALL12 13 -- Recursive case14 SELECT 15 e.id, e.name, e.manager_id,16 oc.level + 1,17 oc.path || ' > ' || e.name18 FROM employees e19 JOIN org_chart oc ON e.manager_id = oc.id20)21SELECT * FROM org_chart ORDER BY path;Recursive CTE — Tree & BOM
1-- Categories with subcategories2WITH RECURSIVE category_tree AS (3 SELECT 4 id, name, parent_id, 0 as depth,5 ARRAY[id] as path_ids,6 name as full_path7 FROM categories8 WHERE parent_id IS NULL9 10 UNION ALL11 12 SELECT 13 c.id, c.name, c.parent_id,14 ct.depth + 1,15 ct.path_ids || c.id,16 ct.full_path || ' > ' || c.name17 FROM categories c18 JOIN category_tree ct ON c.parent_id = ct.id19)20SELECT 21 id,22 REPEAT(' ', depth) || name as indented_name,23 full_path24FROM category_tree25ORDER BY path_ids;Number/Date Series Generation
1-- Generate date range2WITH RECURSIVE dates AS (3 SELECT DATE '2024-01-01' as date4 UNION ALL5 SELECT date + INTERVAL '1 day'6 FROM dates7 WHERE date < DATE '2024-12-31'8)9SELECT date FROM dates;1011-- Fill missing dates in time series12WITH RECURSIVE all_dates AS (13 SELECT MIN(date) as date FROM sales14 UNION ALL15 SELECT date + INTERVAL '1 day'16 FROM all_dates17 WHERE date < (SELECT MAX(date) FROM sales)18),19daily_sales AS (20 SELECT date, SUM(amount) as sales FROM sales GROUP BY date21)22SELECT 23 ad.date,24 COALESCE(ds.sales, 0) as sales25FROM all_dates ad26LEFT JOIN daily_sales ds ON ad.date = ds.date27ORDER BY ad.date;Graph Traversal
1-- Find all paths from A to B2WITH RECURSIVE paths AS (3 SELECT 4 from_node, to_node, weight,5 ARRAY[from_node, to_node] as path,6 weight as total_weight7 FROM edges8 WHERE from_node = 'A'9 10 UNION ALL11 12 SELECT 13 p.from_node, e.to_node, e.weight,14 p.path || e.to_node,15 p.total_weight + e.weight16 FROM paths p17 JOIN edges e ON p.to_node = e.from_node18 WHERE NOT e.to_node = ANY(p.path) -- Avoid cycles19)20SELECT * FROM paths21WHERE to_node = 'Z'22ORDER BY total_weight;Complex CTE Patterns
1-- CLV Analysis with multiple CTEs2WITH 3first_purchase AS (4 SELECT customer_id, MIN(order_date) as first_order_date5 FROM orders GROUP BY customer_id6),7customer_metrics AS (8 SELECT 9 o.customer_id,10 fp.first_order_date,11 COUNT(DISTINCT o.order_id) as total_orders,12 SUM(o.amount) as total_revenue,13 COUNT(DISTINCT DATE_TRUNC('month', o.order_date)) as active_months14 FROM orders o15 JOIN first_purchase fp ON o.customer_id = fp.customer_id16 GROUP BY o.customer_id, fp.first_order_date17),18cohort_analysis AS (19 SELECT 20 customer_id,21 DATE_TRUNC('month', first_order_date) as cohort_month,22 total_orders, total_revenue,23 total_revenue / GREATEST(active_months, 1) as monthly_value24 FROM customer_metrics25)26SELECT 27 cohort_month,28 COUNT(*) as customers,29 ROUND(AVG(total_orders), 1) as avg_orders,30 ROUND(AVG(total_revenue), 2) as avg_revenue,31 ROUND(AVG(monthly_value), 2) as avg_monthly_value32FROM cohort_analysis33GROUP BY cohort_month34ORDER BY cohort_month;Checkpoint
Basic CTEs organize complex queries. Recursive CTEs xử lý hierarchical data (org charts, trees, BOM), generate series, và traverse graphs. Luôn cần base case + termination condition. Recursive CTEs rất hữu ích cho org charts — bạn có thể nghĩ ra cách dùng nó cho product categories trên e-commerce không?
🔍 Query Optimization — EXPLAIN & Indexing
Query chậm = User experience kém, server costs cao, bottlenecks trong pipelines. Query optimization là kỹ năng essential cho mọi Data professional.
Query Execution Flow
Query Execution Pipeline
EXPLAIN Command
1-- PostgreSQL2EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;34/*5Seq Scan on orders (cost=0.00..1520.00 rows=50 width=120)6 (actual time=0.015..12.450 rows=48 loops=1)7 Filter: (customer_id = 100)8 Rows Removed by Filter: 499529Planning Time: 0.085 ms10Execution Time: 12.480 ms11*/1213-- Key metrics:14-- cost: Estimated cost (startup..total)15-- rows: Estimated rows returned16-- actual time: Real execution time (ms)17-- Rows Removed by Filter: Wasted work!Scan Types
1-- Sequential Scan: Reads entire table (slowest)2Seq Scan on orders34-- Index Scan: Uses index to find rows (fast)5Index Scan using idx_customer_id on orders67-- Index Only Scan: All data from index (fastest)8Index Only Scan using idx_customer_date on orders910-- Bitmap Scan: For multiple conditions11Bitmap Heap Scan on orders12 -> Bitmap Index Scan on idx_customer_idIndexing Strategies
1-- Single column index2CREATE INDEX idx_customer_id ON orders(customer_id);34-- Composite index (column order matters!)5CREATE INDEX idx_customer_date ON orders(customer_id, order_date);67-- Partial index (PostgreSQL)8CREATE INDEX idx_active_orders ON orders(order_date)9WHERE status = 'active';1011-- Covering index12CREATE INDEX idx_covering ON orders(customer_id)13INCLUDE (order_date, amount);Composite Index — Leftmost Prefix Rule
1CREATE INDEX idx_a_b_c ON table(a, b, c);23-- This index supports:4WHERE a = ? -- ✅ Uses index5WHERE a = ? AND b = ? -- ✅ Uses index 6WHERE a = ? AND b = ? AND c = ? -- ✅ Uses index7WHERE b = ? AND c = ? -- ❌ Cannot use index (missing 'a')8WHERE a = ? AND c = ? -- ⚠️ Partial use (only 'a')| Nên tạo Index ✅ | Không nên ❌ |
|---|---|
| Primary keys | Small tables (dưới 1000 rows) |
| Foreign keys | Low cardinality columns |
| WHERE/JOIN columns | Frequently updated columns |
| ORDER BY columns | Every column |
| High-cardinality columns |
Checkpoint
EXPLAIN ANALYZE cho biết execution plan thực tế. Index Scan >> Seq Scan cho large tables. Composite index tuân theo leftmost prefix rule. Covering index cho Index Only Scan. Bạn có biết tại sao column order trong composite index lại quan trọng không?
⚡ Query Optimization — Kỹ thuật tối ưu
Sargable Queries
SARG = Search ARGument. Sargable queries can use indexes. Non-sargable queries force full table scans.
1-- ❌ Non-sargable (can't use index)2SELECT * FROM orders WHERE YEAR(order_date) = 2024;3SELECT * FROM users WHERE UPPER(email) = 'TEST@EMAIL.COM';4SELECT * FROM products WHERE price + 10 > 100;56-- ✅ Sargable (can use index)7SELECT * FROM orders 8WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';9SELECT * FROM users WHERE email = 'test@email.com';10SELECT * FROM products WHERE price > 90;EXISTS vs IN
1-- For large subqueries, EXISTS often faster2-- ❌ IN with large result set3SELECT * FROM orders 4WHERE customer_id IN (SELECT id FROM customers WHERE region = 'North');56-- ✅ EXISTS (stops at first match)7SELECT * FROM orders o8WHERE EXISTS (9 SELECT 1 FROM customers c 10 WHERE c.id = o.customer_id AND c.region = 'North'11);JOIN Optimization
1-- Filter early (before join)2SELECT o.*, c.name3FROM orders o4JOIN customers c ON o.customer_id = c.id5WHERE o.order_date >= '2024-01-01' -- Applied before join6AND c.status = 'active';78-- Avoid unnecessary joins9-- ❌ If you don't need customer data10SELECT o.* FROM orders o JOIN customers c ON o.customer_id = c.id;11-- ✅ Skip the join12SELECT * FROM orders;Efficient Pagination
1-- ❌ OFFSET becomes slow for large offsets2SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 10000;34-- ✅ Keyset pagination (use last seen value)5SELECT * FROM orders 6WHERE id > 12345 -- Last ID from previous page7ORDER BY id LIMIT 20;Query Rewriting
1-- Subquery → JOIN2-- ❌ Correlated subquery (runs for each row)3SELECT o.*,4 (SELECT name FROM customers WHERE id = o.customer_id) as customer_name5FROM orders o;67-- ✅ JOIN (single operation)8SELECT o.*, c.name as customer_name9FROM orders o10LEFT JOIN customers c ON o.customer_id = c.id;1112-- UNION ALL vs UNION13-- UNION removes duplicates (requires sort) — use UNION ALL when possiblePartitioning
1-- Range partitioning (PostgreSQL)2CREATE TABLE orders (3 order_id SERIAL,4 order_date DATE,5 amount DECIMAL6) PARTITION BY RANGE (order_date);78CREATE TABLE orders_2023 PARTITION OF orders9 FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');10CREATE TABLE orders_2024 PARTITION OF orders11 FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');1213-- Queries automatically use only relevant partition14SELECT * FROM orders WHERE order_date = '2024-06-15';15-- Only scans orders_2024Common Anti-Patterns
1-- N+1 Query Problem2-- ❌ Loop with individual queries3-- FOR each customer: SELECT * FROM orders WHERE customer_id = ?;4-- ✅ Single query with JOIN5SELECT c.*, o.* FROM customers c LEFT JOIN orders o ON c.id = o.customer_id;67-- Implicit Type Conversion8-- ❌ String to number (can't use index)9SELECT * FROM orders WHERE order_id = '12345'; -- order_id is INT10-- ✅ Correct type11SELECT * FROM orders WHERE order_id = 12345;Optimization Checklist
| Technique | Impact | When to Use |
|---|---|---|
| Indexing | High | Frequent WHERE/JOIN columns |
| EXPLAIN | Essential | Always before optimization |
| Sargable queries | High | All queries with functions |
| **Avoid SELECT *** | Medium | Always |
| JOIN optimization | High | Multi-table queries |
| Partitioning | High | Very large tables (>100M rows) |
Checkpoint
Sargable queries tận dụng được index. EXISTS thường nhanh hơn IN cho large subqueries. Keyset pagination hiệu quả hơn OFFSET. Tránh N+1 problem, implicit type conversion. Optimization checklist: EXPLAIN → Check scan types → Verify index usage → Make sargable → Optimize JOINs. Bạn sẽ bắt đầu từ bước nào?
📋 Tổng kết
Kiến thức đã học
| Chủ đề | Nội dung chính |
|---|---|
| Ranking Functions | ROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK |
| Value Functions | LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE |
| Aggregate Windows | Running totals, Moving averages, Frame clauses, Named windows |
| Applications | YoY analysis, Session detection, Gaps & Islands, Percentiles |
| Basic CTEs | Multi-CTE organization, CTE vs subquery, Materialization |
| Recursive CTEs | Hierarchy traversal, Tree/BOM, Date series, Graph paths |
| EXPLAIN | Execution plans, Scan types, Join types, Cost analysis |
| Indexing | Single/Composite/Partial/Covering indexes, Leftmost prefix rule |
| Optimization | Sargable queries, EXISTS vs IN, Pagination, Partitioning |
Best Practices
- ✅ PARTITION BY = GROUP BY without collapsing rows
- ✅ Always specify frame clause for aggregate window functions
- ✅ Use Named Windows khi dùng cùng window nhiều lần
- ✅ Recursive CTEs cần base case + termination condition
- ✅ Run EXPLAIN ANALYZE trước khi optimize
- ✅ Make queries sargable — không wrap indexed columns trong functions
Câu hỏi tự kiểm tra
- ROW_NUMBER, RANK, DENSE_RANK khác nhau thế nào?
- CTE và subquery: khi nào dùng cái nào?
- Recursive CTE dùng để giải quyết bài toán gì?
- EXPLAIN ANALYZE cho biết thông tin gì?
Bài tiếp theo: SQL + Python Integration →
🎉 Tuyệt vời! Bạn đã master Advanced SQL!
Nhớ: Window Functions + CTEs + Indexes = SQL Pro. Luôn chạy EXPLAIN trước khi deploy query lên production!
