SQL Nâng cao
1. Window Functions
1.1 Giới thiệu Window Functions
Window Functions cho phép tính toán trên một "cửa sổ" (window) của rows liên quan đến row hiện tại.
SQL
1-- Syntax cơ bản2SELECT 3 column1,4 AGG_FUNCTION(column2) OVER (5 PARTITION BY column36 ORDER BY column47 ROWS/RANGE BETWEEN ... AND ...8 ) as window_result9FROM table;1.2 Các loại Window Functions
| Loại | Functions | Mục đích |
|---|---|---|
| Aggregate | SUM, AVG, COUNT, MIN, MAX | Tính toán trên window |
| Ranking | ROW_NUMBER, RANK, DENSE_RANK | Xếp hạng |
| Value | LAG, LEAD, FIRST_VALUE, LAST_VALUE | Truy cập rows khác |
1.3 Ví dụ ROW_NUMBER vs RANK
SQL
1-- Dataset: Sales2CREATE TABLE sales (3 employee VARCHAR(50),4 region VARCHAR(50),5 revenue DECIMAL(10,2)6);78INSERT INTO sales VALUES9('Alice', 'North', 5000),10('Bob', 'North', 7000),11('Charlie', 'South', 7000),12('David', 'South', 6000);1314-- ROW_NUMBER: Số thứ tự liên tục15SELECT 16 employee,17 region,18 revenue,19 ROW_NUMBER() OVER (PARTITION BY region ORDER BY revenue DESC) as row_num20FROM sales;2122/*23employee | region | revenue | row_num24---------|--------|---------|--------25Bob | North | 7000 | 126Alice | North | 5000 | 227Charlie | South | 7000 | 128David | South | 6000 | 229*/3031-- RANK: Giống nhau thì cùng rank, nhảy rank32SELECT 33 employee,34 revenue,35 RANK() OVER (ORDER BY revenue DESC) as rank36FROM sales;3738/*39employee | revenue | rank40---------|---------|-----41Bob | 7000 | 142Charlie | 7000 | 1 -- Cùng rank43David | 6000 | 3 -- Nhảy qua rank 244Alice | 5000 | 445*/4647-- DENSE_RANK: Không nhảy rank48SELECT 49 employee,50 revenue,51 DENSE_RANK() OVER (ORDER BY revenue DESC) as dense_rank52FROM sales;5354/*55employee | revenue | dense_rank56---------|---------|------------57Bob | 7000 | 158Charlie | 7000 | 159David | 6000 | 2 -- Không nhảy60Alice | 5000 | 361*/1.4 LAG và LEAD
SQL
1-- LAG: Lấy giá trị row trước đó2-- LEAD: Lấy giá trị row sau đó34CREATE TABLE stock_prices (5 date DATE,6 price DECIMAL(10,2)7);89SELECT 10 date,11 price,12 LAG(price, 1) OVER (ORDER BY date) as prev_price,13 LEAD(price, 1) OVER (ORDER BY date) as next_price,14 price - LAG(price, 1) OVER (ORDER BY date) as daily_change15FROM stock_prices;1617/*18date | price | prev_price | next_price | daily_change19-----------|-------|------------|------------|-------------202024-01-01 | 100 | NULL | 102 | NULL212024-01-02 | 102 | 100 | 98 | 2222024-01-03 | 98 | 102 | 105 | -4232024-01-04 | 105 | 98 | NULL | 724*/2. Common Table Expressions (CTEs)
2.1 Cú pháp WITH
SQL
1-- CTE đơn giản2WITH high_earners AS (3 SELECT 4 employee_id,5 name,6 salary7 FROM employees8 WHERE salary > 1000009)10SELECT * FROM high_earners11WHERE department = 'Engineering';2.2 Multiple CTEs
SQL
1WITH 2-- CTE 1: Aggregate sales3monthly_sales AS (4 SELECT 5 DATE_TRUNC('month', order_date) as month,6 SUM(amount) as total_sales7 FROM orders8 GROUP BY DATE_TRUNC('month', order_date)9),10-- CTE 2: Calculate growth11sales_growth AS (12 SELECT 13 month,14 total_sales,15 LAG(total_sales) OVER (ORDER BY month) as prev_month_sales,16 (total_sales - LAG(total_sales) OVER (ORDER BY month)) / 17 LAG(total_sales) OVER (ORDER BY month) * 100 as growth_pct18 FROM monthly_sales19)20SELECT * FROM sales_growth21WHERE growth_pct > 10;2.3 Recursive CTEs
SQL
1-- Ví dụ: Employee hierarchy2WITH RECURSIVE emp_hierarchy AS (3 -- Base case: Top manager4 SELECT 5 employee_id,6 name,7 manager_id,8 1 as level9 FROM employees10 WHERE manager_id IS NULL11 12 UNION ALL13 14 -- Recursive case15 SELECT 16 e.employee_id,17 e.name,18 e.manager_id,19 h.level + 120 FROM employees e21 INNER JOIN emp_hierarchy h ON e.manager_id = h.employee_id22)23SELECT * FROM emp_hierarchy24ORDER BY level, name;3. Subqueries Nâng cao
3.1 Correlated Subquery
SQL
1-- Tìm employees có salary > average của department họ2SELECT 3 e.name,4 e.department,5 e.salary6FROM employees e7WHERE e.salary > (8 SELECT AVG(salary)9 FROM employees e210 WHERE e2.department = e.department11);3.2 EXISTS vs IN
SQL
1-- EXISTS: Kiểm tra tồn tại (nhanh hơn với large datasets)2SELECT name3FROM customers c4WHERE EXISTS (5 SELECT 1 6 FROM orders o 7 WHERE o.customer_id = c.id 8 AND o.order_date > '2024-01-01'9);1011-- IN: Kiểm tra trong danh sách12SELECT name13FROM customers14WHERE id IN (15 SELECT DISTINCT customer_id 16 FROM orders 17 WHERE order_date > '2024-01-01'18);4. Query Optimization
4.1 EXPLAIN ANALYZE
SQL
1-- Xem execution plan2EXPLAIN ANALYZE3SELECT *4FROM orders o5JOIN customers c ON o.customer_id = c.id6WHERE o.order_date > '2024-01-01';4.2 Indexing
SQL
1-- Tạo index2CREATE INDEX idx_orders_date ON orders(order_date);3CREATE INDEX idx_orders_customer ON orders(customer_id);45-- Composite index6CREATE INDEX idx_orders_date_customer ON orders(order_date, customer_id);4.3 Best Practices
SQL
1-- ❌ Bad: SELECT *2SELECT * FROM large_table;34-- ✅ Good: Chỉ lấy columns cần thiết5SELECT id, name, created_at FROM large_table;67-- ❌ Bad: Function trong WHERE8SELECT * FROM orders WHERE YEAR(order_date) = 2024;910-- ✅ Good: Sử dụng range11SELECT * FROM orders 12WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';1314-- ❌ Bad: OR với nhiều conditions15SELECT * FROM products 16WHERE category = 'A' OR category = 'B' OR category = 'C';1718-- ✅ Good: IN19SELECT * FROM products 20WHERE category IN ('A', 'B', 'C');Bài tập
Bài 1: Window Functions
Tính running total sales theo tháng cho mỗi product category.
Bài 2: CTEs
Tìm top 5 customers có tổng spending cao nhất mỗi quý.
Bài 3: Recursive CTE
Xây dựng org chart từ employee table.
Bài 4: Optimization
Optimize query chậm bằng indexing và rewrite.
Next: Bài 3 - Pandas Nâng cao
