Lý thuyết
Bài 5/10

CTEs và Window Functions

Common Table Expressions, ROW_NUMBER, RANK, LAG/LEAD và OVER clause

Bài 5: CTEs và Window Functions

Window Analytics Functions

1. Common Table Expressions (CTEs)

1.1 CTE là gì?

CTE (Common Table Expression) là named temporary result set, tồn tại trong scope của một single statement.

SQL
1WITH CTE_Name AS (
2 SELECT column1, column2
3 FROM table1
4 WHERE condition
5)
6SELECT *
7FROM CTE_Name;

1.2 CTE vs Subquery

SQL
1-- ❌ Subquery: khó đọc với nested queries
2SELECT *
3FROM (
4 SELECT *
5 FROM (
6 SELECT * FROM Production.Product WHERE ListPrice > 0
7 ) AS Inner1
8 WHERE ListPrice > 100
9) AS Inner2
10WHERE ListPrice > 500;
11
12-- ✅ CTE: readable, step-by-step
13WITH Products_WithPrice AS (
14 SELECT * FROM Production.Product WHERE ListPrice > 0
15),
16Products_Above100 AS (
17 SELECT * FROM Products_WithPrice WHERE ListPrice > 100
18)
19SELECT * FROM Products_Above100 WHERE ListPrice > 500;

1.3 Basic CTE Examples

Single CTE:

SQL
1WITH HighValueOrders AS (
2 SELECT
3 SalesOrderID,
4 CustomerID,
5 OrderDate,
6 TotalDue
7 FROM Sales.SalesOrderHeader
8 WHERE TotalDue > 10000
9)
10SELECT
11 h.SalesOrderID,
12 h.OrderDate,
13 h.TotalDue,
14 p.FirstName + ' ' + p.LastName AS CustomerName
15FROM HighValueOrders h
16JOIN Sales.Customer c ON h.CustomerID = c.CustomerID
17JOIN Person.Person p ON c.PersonID = p.BusinessEntityID
18ORDER BY h.TotalDue DESC;

Multiple CTEs:

SQL
1WITH
2-- CTE 1: Monthly sales
3MonthlySales AS (
4 SELECT
5 YEAR(OrderDate) AS Year,
6 MONTH(OrderDate) AS Month,
7 SUM(TotalDue) AS Revenue
8 FROM Sales.SalesOrderHeader
9 GROUP BY YEAR(OrderDate), MONTH(OrderDate)
10),
11-- CTE 2: Previous month comparison
12SalesWithPrevious AS (
13 SELECT
14 Year,
15 Month,
16 Revenue,
17 LAG(Revenue) OVER (ORDER BY Year, Month) AS PrevMonthRevenue
18 FROM MonthlySales
19)
20SELECT
21 Year,
22 Month,
23 Revenue,
24 PrevMonthRevenue,
25 CASE
26 WHEN PrevMonthRevenue IS NULL THEN NULL
27 ELSE CAST((Revenue - PrevMonthRevenue) * 100.0 / PrevMonthRevenue AS DECIMAL(10,2))
28 END AS GrowthPercent
29FROM SalesWithPrevious
30ORDER BY Year, Month;

2. Window Functions Overview

2.1 Concept

Window functions perform calculations across a "window" of rows related to the current row, WITHOUT collapsing rows like GROUP BY.

SQL
1SELECT
2 column1,
3 column2,
4 SUM(column3) OVER (PARTITION BY column1 ORDER BY column2) AS RunningTotal
5FROM table1;

2.2 OVER Clause Anatomy

SQL
1function_name() OVER (
2 [PARTITION BY columns] -- Nhóm rows (như GROUP BY nhưng không collapse)
3 [ORDER BY columns] -- Thứ tự tính toán
4 [ROWS/RANGE frame] -- Frame specification
5)

2.3 Window Function Categories

CategoryFunctionsPurpose
RankingROW_NUMBER, RANK, DENSE_RANK, NTILEĐánh số, xếp hạng
AggregateSUM, AVG, COUNT, MIN, MAXRunning totals, moving averages
AnalyticLAG, LEAD, FIRST_VALUE, LAST_VALUERow comparison

3. Ranking Functions

3.1 ROW_NUMBER()

Gán số thứ tự duy nhất cho mỗi row.

SQL
1-- Đánh số employees theo salary trong mỗi department
2SELECT
3 e.BusinessEntityID,
4 p.FirstName + ' ' + p.LastName AS EmployeeName,
5 d.Name AS Department,
6 eph.Rate AS HourlyRate,
7 ROW_NUMBER() OVER (
8 PARTITION BY d.DepartmentID
9 ORDER BY eph.Rate DESC
10 ) AS SalaryRank
11FROM HumanResources.Employee e
12JOIN Person.Person p ON e.BusinessEntityID = p.BusinessEntityID
13JOIN HumanResources.EmployeeDepartmentHistory edh ON e.BusinessEntityID = edh.BusinessEntityID
14JOIN HumanResources.Department d ON edh.DepartmentID = d.DepartmentID
15JOIN HumanResources.EmployeePayHistory eph ON e.BusinessEntityID = eph.BusinessEntityID
16WHERE edh.EndDate IS NULL
17ORDER BY d.Name, SalaryRank;

3.2 RANK() vs DENSE_RANK()

SQL
1-- So sánh các ranking functions
2SELECT
3 Name,
4 ListPrice,
5 ROW_NUMBER() OVER (ORDER BY ListPrice DESC) AS RowNum,
6 RANK() OVER (ORDER BY ListPrice DESC) AS Rank,
7 DENSE_RANK() OVER (ORDER BY ListPrice DESC) AS DenseRank
8FROM Production.Product
9WHERE ListPrice > 0
10ORDER BY ListPrice DESC;
ListPriceROW_NUMBERRANKDENSE_RANK
3578.27111
3578.27211
3399.99332
3374.99443

Key differences:

  • ROW_NUMBER: Luôn unique (1, 2, 3, 4...)
  • RANK: Skip numbers sau ties (1, 1, 3, 4...)
  • DENSE_RANK: Không skip (1, 1, 2, 3...)

3.3 NTILE()

Chia rows thành N groups bằng nhau.

SQL
1-- Chia customers thành 4 quartiles theo total spending
2WITH CustomerSpending AS (
3 SELECT
4 c.CustomerID,
5 SUM(soh.TotalDue) AS TotalSpent
6 FROM Sales.SalesOrderHeader soh
7 JOIN Sales.Customer c ON soh.CustomerID = c.CustomerID
8 GROUP BY c.CustomerID
9)
10SELECT
11 CustomerID,
12 TotalSpent,
13 NTILE(4) OVER (ORDER BY TotalSpent DESC) AS SpendingQuartile
14FROM CustomerSpending
15ORDER BY SpendingQuartile, TotalSpent DESC;

4. Aggregate Window Functions

4.1 Running Total

SQL
1-- Running total of sales by date
2SELECT
3 OrderDate,
4 TotalDue,
5 SUM(TotalDue) OVER (ORDER BY OrderDate) AS RunningTotal
6FROM Sales.SalesOrderHeader
7WHERE YEAR(OrderDate) = 2014
8ORDER BY OrderDate;

4.2 Running Total với PARTITION

SQL
1-- Running total per territory
2SELECT
3 st.Name AS Territory,
4 soh.OrderDate,
5 soh.TotalDue,
6 SUM(soh.TotalDue) OVER (
7 PARTITION BY st.TerritoryID
8 ORDER BY soh.OrderDate
9 ) AS TerritoryRunningTotal
10FROM Sales.SalesOrderHeader soh
11JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
12WHERE YEAR(soh.OrderDate) = 2014
13ORDER BY Territory, soh.OrderDate;

4.3 Moving Average

SQL
1-- 3-month moving average
2WITH MonthlySales AS (
3 SELECT
4 YEAR(OrderDate) AS Year,
5 MONTH(OrderDate) AS Month,
6 SUM(TotalDue) AS MonthlyRevenue
7 FROM Sales.SalesOrderHeader
8 GROUP BY YEAR(OrderDate), MONTH(OrderDate)
9)
10SELECT
11 Year,
12 Month,
13 MonthlyRevenue,
14 AVG(MonthlyRevenue) OVER (
15 ORDER BY Year, Month
16 ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
17 ) AS MovingAvg3Month
18FROM MonthlySales
19ORDER BY Year, Month;

4.4 Percentage of Total

SQL
1-- Category sales as % of total
2WITH CategorySales AS (
3 SELECT
4 pc.Name AS Category,
5 SUM(sod.LineTotal) AS CategoryRevenue
6 FROM Sales.SalesOrderDetail sod
7 JOIN Production.Product p ON sod.ProductID = p.ProductID
8 JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID
9 JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID
10 GROUP BY pc.ProductCategoryID, pc.Name
11)
12SELECT
13 Category,
14 CategoryRevenue,
15 SUM(CategoryRevenue) OVER () AS TotalRevenue,
16 CAST(CategoryRevenue * 100.0 / SUM(CategoryRevenue) OVER () AS DECIMAL(5,2)) AS RevenuePercent
17FROM CategorySales
18ORDER BY CategoryRevenue DESC;

5. Analytic Functions (LAG, LEAD)

5.1 LAG() - Previous Row

SQL
1-- So sánh với tháng trước
2WITH MonthlySales AS (
3 SELECT
4 YEAR(OrderDate) AS Year,
5 MONTH(OrderDate) AS Month,
6 SUM(TotalDue) AS Revenue
7 FROM Sales.SalesOrderHeader
8 GROUP BY YEAR(OrderDate), MONTH(OrderDate)
9)
10SELECT
11 Year,
12 Month,
13 Revenue,
14 LAG(Revenue, 1) OVER (ORDER BY Year, Month) AS PrevMonth,
15 Revenue - LAG(Revenue, 1) OVER (ORDER BY Year, Month) AS MoM_Change
16FROM MonthlySales
17ORDER BY Year, Month;

5.2 LEAD() - Next Row

SQL
1-- Ngày order tiếp theo của mỗi customer
2SELECT
3 CustomerID,
4 SalesOrderID,
5 OrderDate,
6 LEAD(OrderDate, 1) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS NextOrderDate,
7 DATEDIFF(DAY,
8 OrderDate,
9 LEAD(OrderDate, 1) OVER (PARTITION BY CustomerID ORDER BY OrderDate)
10 ) AS DaysBetweenOrders
11FROM Sales.SalesOrderHeader
12WHERE CustomerID IN (11000, 11001, 11002)
13ORDER BY CustomerID, OrderDate;

5.3 Year-over-Year Comparison

SQL
1WITH YearlySales AS (
2 SELECT
3 YEAR(OrderDate) AS Year,
4 SUM(TotalDue) AS Revenue
5 FROM Sales.SalesOrderHeader
6 GROUP BY YEAR(OrderDate)
7)
8SELECT
9 Year,
10 Revenue,
11 LAG(Revenue, 1) OVER (ORDER BY Year) AS PrevYearRevenue,
12 CASE
13 WHEN LAG(Revenue, 1) OVER (ORDER BY Year) IS NULL THEN NULL
14 ELSE CAST((Revenue - LAG(Revenue, 1) OVER (ORDER BY Year)) * 100.0 /
15 LAG(Revenue, 1) OVER (ORDER BY Year) AS DECIMAL(10,2))
16 END AS YoY_GrowthPercent
17FROM YearlySales
18ORDER BY Year;

5.4 FIRST_VALUE / LAST_VALUE

SQL
1-- First and last order value for each customer
2SELECT DISTINCT
3 CustomerID,
4 FIRST_VALUE(TotalDue) OVER (
5 PARTITION BY CustomerID
6 ORDER BY OrderDate
7 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
8 ) AS FirstOrderValue,
9 LAST_VALUE(TotalDue) OVER (
10 PARTITION BY CustomerID
11 ORDER BY OrderDate
12 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
13 ) AS LastOrderValue
14FROM Sales.SalesOrderHeader
15WHERE CustomerID IN (11000, 11001, 11002);

6. Frame Specification

6.1 Syntax

SQL
1ROWS|RANGE BETWEEN start AND end
2
3-- start/end options:
4-- UNBOUNDED PRECEDING - từ đầu partition
5-- n PRECEDING - n rows trước
6-- CURRENT ROW - row hiện tại
7-- n FOLLOWING - n rows sau
8-- UNBOUNDED FOLLOWING - đến cuối partition

6.2 Examples

SQL
1-- Different frame specifications
2SELECT
3 OrderDate,
4 TotalDue,
5
6 -- Running total (default)
7 SUM(TotalDue) OVER (ORDER BY OrderDate) AS RunningTotal,
8
9 -- 3-row moving average
10 AVG(TotalDue) OVER (
11 ORDER BY OrderDate
12 ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
13 ) AS MovingAvg3Row,
14
15 -- Total from start to end (full partition)
16 SUM(TotalDue) OVER (
17 ORDER BY OrderDate
18 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
19 ) AS GrandTotal
20
21FROM Sales.SalesOrderHeader
22WHERE YEAR(OrderDate) = 2014 AND MONTH(OrderDate) = 1
23ORDER BY OrderDate;

7. Practical Use Cases

7.1 Top N Per Group

SQL
1-- Top 3 best-selling products per category
2WITH ProductSales AS (
3 SELECT
4 pc.Name AS Category,
5 p.Name AS ProductName,
6 SUM(sod.OrderQty) AS TotalQuantity,
7 ROW_NUMBER() OVER (
8 PARTITION BY pc.ProductCategoryID
9 ORDER BY SUM(sod.OrderQty) DESC
10 ) AS Rank
11 FROM Sales.SalesOrderDetail sod
12 JOIN Production.Product p ON sod.ProductID = p.ProductID
13 JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID
14 JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID
15 GROUP BY pc.ProductCategoryID, pc.Name, p.ProductID, p.Name
16)
17SELECT Category, ProductName, TotalQuantity
18FROM ProductSales
19WHERE Rank <= 3
20ORDER BY Category, Rank;

7.2 Gaps and Islands

SQL
1-- Find gaps in order dates (days without orders)
2WITH OrderDates AS (
3 SELECT DISTINCT CAST(OrderDate AS DATE) AS OrderDate
4 FROM Sales.SalesOrderHeader
5),
6OrderDatesWithNext AS (
7 SELECT
8 OrderDate,
9 LEAD(OrderDate) OVER (ORDER BY OrderDate) AS NextOrderDate,
10 DATEDIFF(DAY, OrderDate, LEAD(OrderDate) OVER (ORDER BY OrderDate)) AS DaysToNext
11 FROM OrderDates
12)
13SELECT *
14FROM OrderDatesWithNext
15WHERE DaysToNext > 1
16ORDER BY OrderDate;

7.3 Running vs Total Comparison

SQL
1-- Sales progress toward annual target
2WITH DailySales AS (
3 SELECT
4 CAST(OrderDate AS DATE) AS OrderDate,
5 SUM(TotalDue) AS DailyRevenue
6 FROM Sales.SalesOrderHeader
7 WHERE YEAR(OrderDate) = 2013
8 GROUP BY CAST(OrderDate AS DATE)
9)
10SELECT
11 OrderDate,
12 DailyRevenue,
13 SUM(DailyRevenue) OVER (ORDER BY OrderDate) AS YTD_Revenue,
14 SUM(DailyRevenue) OVER () AS AnnualTotal,
15 CAST(SUM(DailyRevenue) OVER (ORDER BY OrderDate) * 100.0 /
16 SUM(DailyRevenue) OVER () AS DECIMAL(5,2)) AS YTD_Percent
17FROM DailySales
18ORDER BY OrderDate;

8. Thực hành

Exercises

Exercise 1: ROW_NUMBER

SQL
1-- Lấy order gần nhất của mỗi customer
2-- Hiển thị: CustomerID, OrderDate, TotalDue
3
4-- YOUR CODE HERE
💡 Xem đáp án
SQL
1WITH RankedOrders AS (
2 SELECT
3 CustomerID,
4 OrderDate,
5 TotalDue,
6 ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) AS RowNum
7 FROM Sales.SalesOrderHeader
8)
9SELECT CustomerID, OrderDate, TotalDue
10FROM RankedOrders
11WHERE RowNum = 1
12ORDER BY CustomerID;

Exercise 2: Running Total

SQL
1-- Tính running total doanh thu theo tháng năm 2013
2-- Hiển thị: Month, MonthlyRevenue, YTD_Revenue
3
4-- YOUR CODE HERE
💡 Xem đáp án
SQL
1WITH MonthlySales AS (
2 SELECT
3 MONTH(OrderDate) AS Month,
4 SUM(TotalDue) AS MonthlyRevenue
5 FROM Sales.SalesOrderHeader
6 WHERE YEAR(OrderDate) = 2013
7 GROUP BY MONTH(OrderDate)
8)
9SELECT
10 Month,
11 MonthlyRevenue,
12 SUM(MonthlyRevenue) OVER (ORDER BY Month) AS YTD_Revenue
13FROM MonthlySales
14ORDER BY Month;

Exercise 3: LAG

SQL
1-- So sánh doanh thu mỗi territory với tháng trước
2-- Hiển thị: Territory, Month, Revenue, PrevMonthRevenue, Growth%
3
4-- YOUR CODE HERE
💡 Xem đáp án
SQL
1WITH TerritoryMonthlySales AS (
2 SELECT
3 st.Name AS Territory,
4 YEAR(soh.OrderDate) AS Year,
5 MONTH(soh.OrderDate) AS Month,
6 SUM(soh.TotalDue) AS Revenue
7 FROM Sales.SalesOrderHeader soh
8 JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID
9 GROUP BY st.TerritoryID, st.Name, YEAR(soh.OrderDate), MONTH(soh.OrderDate)
10)
11SELECT
12 Territory,
13 Year,
14 Month,
15 Revenue,
16 LAG(Revenue) OVER (PARTITION BY Territory ORDER BY Year, Month) AS PrevMonthRevenue,
17 CASE
18 WHEN LAG(Revenue) OVER (PARTITION BY Territory ORDER BY Year, Month) IS NULL THEN NULL
19 ELSE CAST((Revenue - LAG(Revenue) OVER (PARTITION BY Territory ORDER BY Year, Month)) * 100.0 /
20 LAG(Revenue) OVER (PARTITION BY Territory ORDER BY Year, Month) AS DECIMAL(10,2))
21 END AS GrowthPercent
22FROM TerritoryMonthlySales
23ORDER BY Territory, Year, Month;

Exercise 4: NTILE

SQL
1-- Chia products thành 5 price tiers
2-- Hiển thị: ProductName, ListPrice, PriceTier (1=cheapest, 5=most expensive)
3
4-- YOUR CODE HERE
💡 Xem đáp án
SQL
1SELECT
2 Name AS ProductName,
3 ListPrice,
4 NTILE(5) OVER (ORDER BY ListPrice) AS PriceTier
5FROM Production.Product
6WHERE ListPrice > 0
7ORDER BY PriceTier, ListPrice;

9. Tổng kết

CTE Benefits

  • ✅ Improves readability
  • ✅ Allows recursive queries
  • ✅ Can be referenced multiple times in same query
  • ✅ Easier to debug step-by-step

Window Functions Summary

FunctionPurposeExample
ROW_NUMBER()Unique row numberTop N per group
RANK()Rank with gapsCompetition ranking
DENSE_RANK()Rank no gapsDense ranking
NTILE(n)Divide into n groupsQuartiles
SUM() OVERRunning/cumulative sumYTD totals
AVG() OVERMoving averageTrend analysis
LAG()Previous rowMoM comparison
LEAD()Next rowFuture prediction

Bài tiếp theo: Data Modification (INSERT, UPDATE, DELETE)