Bài 5: CTEs và Window Functions
1. Common Table Expressions (CTEs)
1.1 CTE là gì?
CTE (Common Table Expression) là named temporary result set, tồn tại trong scope của một single statement.
SQL
1WITH CTE_Name AS (2 SELECT column1, column23 FROM table14 WHERE condition5)6SELECT *7FROM CTE_Name;1.2 CTE vs Subquery
SQL
1-- ❌ Subquery: khó đọc với nested queries2SELECT *3FROM (4 SELECT *5 FROM (6 SELECT * FROM Production.Product WHERE ListPrice > 07 ) AS Inner18 WHERE ListPrice > 1009) AS Inner210WHERE ListPrice > 500;1112-- ✅ CTE: readable, step-by-step13WITH Products_WithPrice AS (14 SELECT * FROM Production.Product WHERE ListPrice > 015),16Products_Above100 AS (17 SELECT * FROM Products_WithPrice WHERE ListPrice > 10018)19SELECT * FROM Products_Above100 WHERE ListPrice > 500;1.3 Basic CTE Examples
Single CTE:
SQL
1WITH HighValueOrders AS (2 SELECT 3 SalesOrderID,4 CustomerID,5 OrderDate,6 TotalDue7 FROM Sales.SalesOrderHeader8 WHERE TotalDue > 100009)10SELECT 11 h.SalesOrderID,12 h.OrderDate,13 h.TotalDue,14 p.FirstName + ' ' + p.LastName AS CustomerName15FROM HighValueOrders h16JOIN Sales.Customer c ON h.CustomerID = c.CustomerID17JOIN Person.Person p ON c.PersonID = p.BusinessEntityID18ORDER BY h.TotalDue DESC;Multiple CTEs:
SQL
1WITH 2-- CTE 1: Monthly sales3MonthlySales AS (4 SELECT 5 YEAR(OrderDate) AS Year,6 MONTH(OrderDate) AS Month,7 SUM(TotalDue) AS Revenue8 FROM Sales.SalesOrderHeader9 GROUP BY YEAR(OrderDate), MONTH(OrderDate)10),11-- CTE 2: Previous month comparison12SalesWithPrevious AS (13 SELECT 14 Year,15 Month,16 Revenue,17 LAG(Revenue) OVER (ORDER BY Year, Month) AS PrevMonthRevenue18 FROM MonthlySales19)20SELECT 21 Year,22 Month,23 Revenue,24 PrevMonthRevenue,25 CASE 26 WHEN PrevMonthRevenue IS NULL THEN NULL27 ELSE CAST((Revenue - PrevMonthRevenue) * 100.0 / PrevMonthRevenue AS DECIMAL(10,2))28 END AS GrowthPercent29FROM SalesWithPrevious30ORDER BY Year, Month;2. Window Functions Overview
2.1 Concept
Window functions perform calculations across a "window" of rows related to the current row, WITHOUT collapsing rows like GROUP BY.
SQL
1SELECT 2 column1,3 column2,4 SUM(column3) OVER (PARTITION BY column1 ORDER BY column2) AS RunningTotal5FROM table1;2.2 OVER Clause Anatomy
SQL
1function_name() OVER (2 [PARTITION BY columns] -- Nhóm rows (như GROUP BY nhưng không collapse)3 [ORDER BY columns] -- Thứ tự tính toán4 [ROWS/RANGE frame] -- Frame specification5)2.3 Window Function Categories
| Category | Functions | Purpose |
|---|---|---|
| Ranking | ROW_NUMBER, RANK, DENSE_RANK, NTILE | Đánh số, xếp hạng |
| Aggregate | SUM, AVG, COUNT, MIN, MAX | Running totals, moving averages |
| Analytic | LAG, LEAD, FIRST_VALUE, LAST_VALUE | Row comparison |
3. Ranking Functions
3.1 ROW_NUMBER()
Gán số thứ tự duy nhất cho mỗi row.
SQL
1-- Đánh số employees theo salary trong mỗi department2SELECT 3 e.BusinessEntityID,4 p.FirstName + ' ' + p.LastName AS EmployeeName,5 d.Name AS Department,6 eph.Rate AS HourlyRate,7 ROW_NUMBER() OVER (8 PARTITION BY d.DepartmentID 9 ORDER BY eph.Rate DESC10 ) AS SalaryRank11FROM HumanResources.Employee e12JOIN Person.Person p ON e.BusinessEntityID = p.BusinessEntityID13JOIN HumanResources.EmployeeDepartmentHistory edh ON e.BusinessEntityID = edh.BusinessEntityID14JOIN HumanResources.Department d ON edh.DepartmentID = d.DepartmentID15JOIN HumanResources.EmployeePayHistory eph ON e.BusinessEntityID = eph.BusinessEntityID16WHERE edh.EndDate IS NULL17ORDER BY d.Name, SalaryRank;3.2 RANK() vs DENSE_RANK()
SQL
1-- So sánh các ranking functions2SELECT 3 Name,4 ListPrice,5 ROW_NUMBER() OVER (ORDER BY ListPrice DESC) AS RowNum,6 RANK() OVER (ORDER BY ListPrice DESC) AS Rank,7 DENSE_RANK() OVER (ORDER BY ListPrice DESC) AS DenseRank8FROM Production.Product9WHERE ListPrice > 010ORDER BY ListPrice DESC;| ListPrice | ROW_NUMBER | RANK | DENSE_RANK |
|---|---|---|---|
| 3578.27 | 1 | 1 | 1 |
| 3578.27 | 2 | 1 | 1 |
| 3399.99 | 3 | 3 | 2 |
| 3374.99 | 4 | 4 | 3 |
Key differences:
- ROW_NUMBER: Luôn unique (1, 2, 3, 4...)
- RANK: Skip numbers sau ties (1, 1, 3, 4...)
- DENSE_RANK: Không skip (1, 1, 2, 3...)
3.3 NTILE()
Chia rows thành N groups bằng nhau.
SQL
1-- Chia customers thành 4 quartiles theo total spending2WITH CustomerSpending AS (3 SELECT 4 c.CustomerID,5 SUM(soh.TotalDue) AS TotalSpent6 FROM Sales.SalesOrderHeader soh7 JOIN Sales.Customer c ON soh.CustomerID = c.CustomerID8 GROUP BY c.CustomerID9)10SELECT 11 CustomerID,12 TotalSpent,13 NTILE(4) OVER (ORDER BY TotalSpent DESC) AS SpendingQuartile14FROM CustomerSpending15ORDER BY SpendingQuartile, TotalSpent DESC;4. Aggregate Window Functions
4.1 Running Total
SQL
1-- Running total of sales by date2SELECT 3 OrderDate,4 TotalDue,5 SUM(TotalDue) OVER (ORDER BY OrderDate) AS RunningTotal6FROM Sales.SalesOrderHeader7WHERE YEAR(OrderDate) = 20148ORDER BY OrderDate;4.2 Running Total với PARTITION
SQL
1-- Running total per territory2SELECT 3 st.Name AS Territory,4 soh.OrderDate,5 soh.TotalDue,6 SUM(soh.TotalDue) OVER (7 PARTITION BY st.TerritoryID 8 ORDER BY soh.OrderDate9 ) AS TerritoryRunningTotal10FROM Sales.SalesOrderHeader soh11JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID12WHERE YEAR(soh.OrderDate) = 201413ORDER BY Territory, soh.OrderDate;4.3 Moving Average
SQL
1-- 3-month moving average2WITH MonthlySales AS (3 SELECT 4 YEAR(OrderDate) AS Year,5 MONTH(OrderDate) AS Month,6 SUM(TotalDue) AS MonthlyRevenue7 FROM Sales.SalesOrderHeader8 GROUP BY YEAR(OrderDate), MONTH(OrderDate)9)10SELECT 11 Year,12 Month,13 MonthlyRevenue,14 AVG(MonthlyRevenue) OVER (15 ORDER BY Year, Month 16 ROWS BETWEEN 2 PRECEDING AND CURRENT ROW17 ) AS MovingAvg3Month18FROM MonthlySales19ORDER BY Year, Month;4.4 Percentage of Total
SQL
1-- Category sales as % of total2WITH CategorySales AS (3 SELECT 4 pc.Name AS Category,5 SUM(sod.LineTotal) AS CategoryRevenue6 FROM Sales.SalesOrderDetail sod7 JOIN Production.Product p ON sod.ProductID = p.ProductID8 JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID9 JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID10 GROUP BY pc.ProductCategoryID, pc.Name11)12SELECT 13 Category,14 CategoryRevenue,15 SUM(CategoryRevenue) OVER () AS TotalRevenue,16 CAST(CategoryRevenue * 100.0 / SUM(CategoryRevenue) OVER () AS DECIMAL(5,2)) AS RevenuePercent17FROM CategorySales18ORDER BY CategoryRevenue DESC;5. Analytic Functions (LAG, LEAD)
5.1 LAG() - Previous Row
SQL
1-- So sánh với tháng trước2WITH MonthlySales AS (3 SELECT 4 YEAR(OrderDate) AS Year,5 MONTH(OrderDate) AS Month,6 SUM(TotalDue) AS Revenue7 FROM Sales.SalesOrderHeader8 GROUP BY YEAR(OrderDate), MONTH(OrderDate)9)10SELECT 11 Year,12 Month,13 Revenue,14 LAG(Revenue, 1) OVER (ORDER BY Year, Month) AS PrevMonth,15 Revenue - LAG(Revenue, 1) OVER (ORDER BY Year, Month) AS MoM_Change16FROM MonthlySales17ORDER BY Year, Month;5.2 LEAD() - Next Row
SQL
1-- Ngày order tiếp theo của mỗi customer2SELECT 3 CustomerID,4 SalesOrderID,5 OrderDate,6 LEAD(OrderDate, 1) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS NextOrderDate,7 DATEDIFF(DAY, 8 OrderDate, 9 LEAD(OrderDate, 1) OVER (PARTITION BY CustomerID ORDER BY OrderDate)10 ) AS DaysBetweenOrders11FROM Sales.SalesOrderHeader12WHERE CustomerID IN (11000, 11001, 11002)13ORDER BY CustomerID, OrderDate;5.3 Year-over-Year Comparison
SQL
1WITH YearlySales AS (2 SELECT 3 YEAR(OrderDate) AS Year,4 SUM(TotalDue) AS Revenue5 FROM Sales.SalesOrderHeader6 GROUP BY YEAR(OrderDate)7)8SELECT 9 Year,10 Revenue,11 LAG(Revenue, 1) OVER (ORDER BY Year) AS PrevYearRevenue,12 CASE 13 WHEN LAG(Revenue, 1) OVER (ORDER BY Year) IS NULL THEN NULL14 ELSE CAST((Revenue - LAG(Revenue, 1) OVER (ORDER BY Year)) * 100.0 / 15 LAG(Revenue, 1) OVER (ORDER BY Year) AS DECIMAL(10,2))16 END AS YoY_GrowthPercent17FROM YearlySales18ORDER BY Year;5.4 FIRST_VALUE / LAST_VALUE
SQL
1-- First and last order value for each customer2SELECT DISTINCT3 CustomerID,4 FIRST_VALUE(TotalDue) OVER (5 PARTITION BY CustomerID 6 ORDER BY OrderDate7 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING8 ) AS FirstOrderValue,9 LAST_VALUE(TotalDue) OVER (10 PARTITION BY CustomerID 11 ORDER BY OrderDate12 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING13 ) AS LastOrderValue14FROM Sales.SalesOrderHeader15WHERE CustomerID IN (11000, 11001, 11002);6. Frame Specification
6.1 Syntax
SQL
1ROWS|RANGE BETWEEN start AND end23-- start/end options:4-- UNBOUNDED PRECEDING - từ đầu partition5-- n PRECEDING - n rows trước6-- CURRENT ROW - row hiện tại7-- n FOLLOWING - n rows sau8-- UNBOUNDED FOLLOWING - đến cuối partition6.2 Examples
SQL
1-- Different frame specifications2SELECT 3 OrderDate,4 TotalDue,5 6 -- Running total (default)7 SUM(TotalDue) OVER (ORDER BY OrderDate) AS RunningTotal,8 9 -- 3-row moving average10 AVG(TotalDue) OVER (11 ORDER BY OrderDate 12 ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING13 ) AS MovingAvg3Row,14 15 -- Total from start to end (full partition)16 SUM(TotalDue) OVER (17 ORDER BY OrderDate 18 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING19 ) AS GrandTotal2021FROM Sales.SalesOrderHeader22WHERE YEAR(OrderDate) = 2014 AND MONTH(OrderDate) = 123ORDER BY OrderDate;7. Practical Use Cases
7.1 Top N Per Group
SQL
1-- Top 3 best-selling products per category2WITH ProductSales AS (3 SELECT 4 pc.Name AS Category,5 p.Name AS ProductName,6 SUM(sod.OrderQty) AS TotalQuantity,7 ROW_NUMBER() OVER (8 PARTITION BY pc.ProductCategoryID 9 ORDER BY SUM(sod.OrderQty) DESC10 ) AS Rank11 FROM Sales.SalesOrderDetail sod12 JOIN Production.Product p ON sod.ProductID = p.ProductID13 JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID14 JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID15 GROUP BY pc.ProductCategoryID, pc.Name, p.ProductID, p.Name16)17SELECT Category, ProductName, TotalQuantity18FROM ProductSales19WHERE Rank <= 320ORDER BY Category, Rank;7.2 Gaps and Islands
SQL
1-- Find gaps in order dates (days without orders)2WITH OrderDates AS (3 SELECT DISTINCT CAST(OrderDate AS DATE) AS OrderDate4 FROM Sales.SalesOrderHeader5),6OrderDatesWithNext AS (7 SELECT 8 OrderDate,9 LEAD(OrderDate) OVER (ORDER BY OrderDate) AS NextOrderDate,10 DATEDIFF(DAY, OrderDate, LEAD(OrderDate) OVER (ORDER BY OrderDate)) AS DaysToNext11 FROM OrderDates12)13SELECT *14FROM OrderDatesWithNext15WHERE DaysToNext > 116ORDER BY OrderDate;7.3 Running vs Total Comparison
SQL
1-- Sales progress toward annual target2WITH DailySales AS (3 SELECT 4 CAST(OrderDate AS DATE) AS OrderDate,5 SUM(TotalDue) AS DailyRevenue6 FROM Sales.SalesOrderHeader7 WHERE YEAR(OrderDate) = 20138 GROUP BY CAST(OrderDate AS DATE)9)10SELECT 11 OrderDate,12 DailyRevenue,13 SUM(DailyRevenue) OVER (ORDER BY OrderDate) AS YTD_Revenue,14 SUM(DailyRevenue) OVER () AS AnnualTotal,15 CAST(SUM(DailyRevenue) OVER (ORDER BY OrderDate) * 100.0 / 16 SUM(DailyRevenue) OVER () AS DECIMAL(5,2)) AS YTD_Percent17FROM DailySales18ORDER BY OrderDate;8. Thực hành
Exercises
Exercise 1: ROW_NUMBER
SQL
1-- Lấy order gần nhất của mỗi customer2-- Hiển thị: CustomerID, OrderDate, TotalDue34-- YOUR CODE HERE💡 Xem đáp án
SQL
1WITH RankedOrders AS (2 SELECT 3 CustomerID,4 OrderDate,5 TotalDue,6 ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) AS RowNum7 FROM Sales.SalesOrderHeader8)9SELECT CustomerID, OrderDate, TotalDue10FROM RankedOrders11WHERE RowNum = 112ORDER BY CustomerID;Exercise 2: Running Total
SQL
1-- Tính running total doanh thu theo tháng năm 20132-- Hiển thị: Month, MonthlyRevenue, YTD_Revenue34-- YOUR CODE HERE💡 Xem đáp án
SQL
1WITH MonthlySales AS (2 SELECT 3 MONTH(OrderDate) AS Month,4 SUM(TotalDue) AS MonthlyRevenue5 FROM Sales.SalesOrderHeader6 WHERE YEAR(OrderDate) = 20137 GROUP BY MONTH(OrderDate)8)9SELECT 10 Month,11 MonthlyRevenue,12 SUM(MonthlyRevenue) OVER (ORDER BY Month) AS YTD_Revenue13FROM MonthlySales14ORDER BY Month;Exercise 3: LAG
SQL
1-- So sánh doanh thu mỗi territory với tháng trước2-- Hiển thị: Territory, Month, Revenue, PrevMonthRevenue, Growth%34-- YOUR CODE HERE💡 Xem đáp án
SQL
1WITH TerritoryMonthlySales AS (2 SELECT 3 st.Name AS Territory,4 YEAR(soh.OrderDate) AS Year,5 MONTH(soh.OrderDate) AS Month,6 SUM(soh.TotalDue) AS Revenue7 FROM Sales.SalesOrderHeader soh8 JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID9 GROUP BY st.TerritoryID, st.Name, YEAR(soh.OrderDate), MONTH(soh.OrderDate)10)11SELECT 12 Territory,13 Year,14 Month,15 Revenue,16 LAG(Revenue) OVER (PARTITION BY Territory ORDER BY Year, Month) AS PrevMonthRevenue,17 CASE 18 WHEN LAG(Revenue) OVER (PARTITION BY Territory ORDER BY Year, Month) IS NULL THEN NULL19 ELSE CAST((Revenue - LAG(Revenue) OVER (PARTITION BY Territory ORDER BY Year, Month)) * 100.0 /20 LAG(Revenue) OVER (PARTITION BY Territory ORDER BY Year, Month) AS DECIMAL(10,2))21 END AS GrowthPercent22FROM TerritoryMonthlySales23ORDER BY Territory, Year, Month;Exercise 4: NTILE
SQL
1-- Chia products thành 5 price tiers2-- Hiển thị: ProductName, ListPrice, PriceTier (1=cheapest, 5=most expensive)34-- YOUR CODE HERE💡 Xem đáp án
SQL
1SELECT 2 Name AS ProductName,3 ListPrice,4 NTILE(5) OVER (ORDER BY ListPrice) AS PriceTier5FROM Production.Product6WHERE ListPrice > 07ORDER BY PriceTier, ListPrice;9. Tổng kết
CTE Benefits
- ✅ Improves readability
- ✅ Allows recursive queries
- ✅ Can be referenced multiple times in same query
- ✅ Easier to debug step-by-step
Window Functions Summary
| Function | Purpose | Example |
|---|---|---|
| ROW_NUMBER() | Unique row number | Top N per group |
| RANK() | Rank with gaps | Competition ranking |
| DENSE_RANK() | Rank no gaps | Dense ranking |
| NTILE(n) | Divide into n groups | Quartiles |
| SUM() OVER | Running/cumulative sum | YTD totals |
| AVG() OVER | Moving average | Trend analysis |
| LAG() | Previous row | MoM comparison |
| LEAD() | Next row | Future prediction |
Bài tiếp theo: Data Modification (INSERT, UPDATE, DELETE)
