Bài 9: Advanced Analytics
1. PIVOT
1.1 PIVOT Concept
Chuyển rows thành columns (cross-tabulation).
Text
1Before PIVOT: After PIVOT:2Year Quarter Sales Year Q1 Q2 Q3 Q432013 1 100 2013 100 200 150 30042013 2 200 2014 120 220 170 28052013 3 15062013 4 3007...1.2 PIVOT Syntax
SQL
1SELECT *2FROM (3 -- Source query4 SELECT Year, Quarter, Sales5 FROM SalesTable6) AS SourceTable7PIVOT (8 SUM(Sales) -- Aggregate function9 FOR Quarter IN ([1], [2], [3], [4]) -- Values to become columns10) AS PivotTable;1.3 PIVOT Examples
Sales by Category per Year:
SQL
1SELECT *2FROM (3 SELECT 4 pc.Name AS Category,5 YEAR(soh.OrderDate) AS OrderYear,6 sod.LineTotal7 FROM Sales.SalesOrderDetail sod8 JOIN Sales.SalesOrderHeader soh ON sod.SalesOrderID = soh.SalesOrderID9 JOIN Production.Product p ON sod.ProductID = p.ProductID10 JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID11 JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID12) AS SourceData13PIVOT (14 SUM(LineTotal)15 FOR OrderYear IN ([2011], [2012], [2013], [2014])16) AS PivotTable17ORDER BY Category;Monthly Sales per Territory:
SQL
1SELECT *2FROM (3 SELECT 4 st.Name AS Territory,5 MONTH(soh.OrderDate) AS OrderMonth,6 soh.TotalDue7 FROM Sales.SalesOrderHeader soh8 JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID9 WHERE YEAR(soh.OrderDate) = 201310) AS SourceData11PIVOT (12 SUM(TotalDue)13 FOR OrderMonth IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])14) AS PivotTable;1.4 Dynamic PIVOT
SQL
1DECLARE @Columns NVARCHAR(MAX);2DECLARE @SQL NVARCHAR(MAX);34-- Get unique years dynamically5SELECT @Columns = STRING_AGG(QUOTENAME(OrderYear), ', ')6FROM (7 SELECT DISTINCT YEAR(OrderDate) AS OrderYear8 FROM Sales.SalesOrderHeader9) AS Years;1011-- Build dynamic SQL12SET @SQL = N'13SELECT *14FROM (15 SELECT 16 pc.Name AS Category,17 YEAR(soh.OrderDate) AS OrderYear,18 sod.LineTotal19 FROM Sales.SalesOrderDetail sod20 JOIN Sales.SalesOrderHeader soh ON sod.SalesOrderID = soh.SalesOrderID21 JOIN Production.Product p ON sod.ProductID = p.ProductID22 JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID23 JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID24) AS SourceData25PIVOT (26 SUM(LineTotal)27 FOR OrderYear IN (' + @Columns + ')28) AS PivotTable29ORDER BY Category;';3031EXEC sp_executesql @SQL;2. UNPIVOT
2.1 UNPIVOT Concept
Ngược lại PIVOT: chuyển columns thành rows.
Text
1Before UNPIVOT: After UNPIVOT:2Product Q1 Q2 Product Quarter Sales3Bikes 100 200 Bikes Q1 1004Parts 150 180 Bikes Q2 2005 Parts Q1 1506 Parts Q2 1802.2 UNPIVOT Syntax
SQL
1SELECT *2FROM (3 SELECT Product, Q1, Q2, Q3, Q44 FROM QuarterlySales5) AS SourceTable6UNPIVOT (7 Sales FOR Quarter IN (Q1, Q2, Q3, Q4)8) AS UnpivotTable;2.3 UNPIVOT Example
SQL
1-- First create a pivoted temp table2WITH PivotedData AS (3 SELECT *4 FROM (5 SELECT 6 pc.Name AS Category,7 YEAR(soh.OrderDate) AS OrderYear,8 sod.LineTotal9 FROM Sales.SalesOrderDetail sod10 JOIN Sales.SalesOrderHeader soh ON sod.SalesOrderID = soh.SalesOrderID11 JOIN Production.Product p ON sod.ProductID = p.ProductID12 JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID13 JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID14 ) AS SourceData15 PIVOT (16 SUM(LineTotal)17 FOR OrderYear IN ([2011], [2012], [2013], [2014])18 ) AS PivotTable19)20-- Then UNPIVOT back21SELECT Category, Year, Revenue22FROM PivotedData23UNPIVOT (24 Revenue FOR Year IN ([2011], [2012], [2013], [2014])25) AS UnpivotTable26ORDER BY Category, Year;3. GROUPING SETS
3.1 GROUPING SETS Concept
Tạo multiple GROUP BY trong single query.
SQL
1-- Traditional: 3 separate queries2SELECT Category, NULL, SUM(Sales) FROM ... GROUP BY Category3UNION ALL4SELECT NULL, Year, SUM(Sales) FROM ... GROUP BY Year5UNION ALL6SELECT NULL, NULL, SUM(Sales) FROM ... -- Grand total78-- With GROUPING SETS: single query9SELECT Category, Year, SUM(Sales)10FROM ...11GROUP BY GROUPING SETS (12 (Category),13 (Year),14 () -- Grand total15);3.2 GROUPING SETS Examples
Category and Year combinations:
SQL
1SELECT 2 pc.Name AS Category,3 YEAR(soh.OrderDate) AS Year,4 SUM(sod.LineTotal) AS Revenue5FROM Sales.SalesOrderDetail sod6JOIN Sales.SalesOrderHeader soh ON sod.SalesOrderID = soh.SalesOrderID7JOIN Production.Product p ON sod.ProductID = p.ProductID8JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID9JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID10GROUP BY GROUPING SETS (11 (pc.Name, YEAR(soh.OrderDate)), -- Category + Year12 (pc.Name), -- Category subtotal13 (YEAR(soh.OrderDate)), -- Year subtotal14 () -- Grand total15)16ORDER BY 17 GROUPING(pc.Name),18 GROUPING(YEAR(soh.OrderDate)),19 pc.Name,20 Year;3.3 ROLLUP
Hierarchical subtotals (right-to-left rollup).
SQL
1SELECT 2 pc.Name AS Category,3 ps.Name AS Subcategory,4 SUM(sod.LineTotal) AS Revenue5FROM Sales.SalesOrderDetail sod6JOIN Production.Product p ON sod.ProductID = p.ProductID7JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID8JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID9GROUP BY ROLLUP (pc.Name, ps.Name)10ORDER BY 11 GROUPING(pc.Name),12 pc.Name,13 GROUPING(ps.Name),14 ps.Name;Output structure:
Text
1Category Subcategory Revenue2Bikes Mountain Bikes 10000003Bikes Road Bikes 8000004Bikes NULL 1800000 ← Category subtotal5Clothing Caps 500006Clothing Jerseys 1000007Clothing NULL 150000 ← Category subtotal8NULL NULL 1950000 ← Grand total3.4 CUBE
All possible combinations.
SQL
1SELECT 2 st.Name AS Territory,3 YEAR(soh.OrderDate) AS Year,4 SUM(soh.TotalDue) AS Revenue5FROM Sales.SalesOrderHeader soh6JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID7GROUP BY CUBE (st.Name, YEAR(soh.OrderDate))8ORDER BY 9 GROUPING(st.Name),10 st.Name,11 GROUPING(YEAR(soh.OrderDate)),12 Year;3.5 GROUPING Function
Identify subtotal rows.
SQL
1SELECT 2 pc.Name AS Category,3 YEAR(soh.OrderDate) AS Year,4 SUM(sod.LineTotal) AS Revenue,5 GROUPING(pc.Name) AS Is_Category_Subtotal,6 GROUPING(YEAR(soh.OrderDate)) AS Is_Year_Subtotal,7 CASE 8 WHEN GROUPING(pc.Name) = 1 AND GROUPING(YEAR(soh.OrderDate)) = 1 THEN 'Grand Total'9 WHEN GROUPING(pc.Name) = 1 THEN 'Year Total'10 WHEN GROUPING(YEAR(soh.OrderDate)) = 1 THEN 'Category Total'11 ELSE 'Detail'12 END AS RowType13FROM Sales.SalesOrderDetail sod14JOIN Sales.SalesOrderHeader soh ON sod.SalesOrderID = soh.SalesOrderID15JOIN Production.Product p ON sod.ProductID = p.ProductID16JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID17JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID18GROUP BY ROLLUP (pc.Name, YEAR(soh.OrderDate))19ORDER BY GROUPING(pc.Name), pc.Name, GROUPING(YEAR(soh.OrderDate)), Year;4. Advanced Aggregation Patterns
4.1 Running Calculations
SQL
1-- Running average với different windows2SELECT 3 OrderDate,4 TotalDue,5 -- Running average (all previous)6 AVG(TotalDue) OVER (ORDER BY OrderDate ROWS UNBOUNDED PRECEDING) AS RunningAvg,7 -- 7-day moving average8 AVG(TotalDue) OVER (ORDER BY OrderDate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS MovingAvg7Day9FROM Sales.SalesOrderHeader10WHERE YEAR(OrderDate) = 2013 AND MONTH(OrderDate) = 111ORDER BY OrderDate;4.2 Percent of Total
SQL
1-- Multiple levels of percentage2SELECT 3 pc.Name AS Category,4 ps.Name AS Subcategory,5 SUM(sod.LineTotal) AS Revenue,6 -- % of category7 CAST(SUM(sod.LineTotal) * 100.0 / 8 SUM(SUM(sod.LineTotal)) OVER (PARTITION BY pc.Name) AS DECIMAL(5,2)) AS PctOfCategory,9 -- % of grand total10 CAST(SUM(sod.LineTotal) * 100.0 / 11 SUM(SUM(sod.LineTotal)) OVER () AS DECIMAL(5,2)) AS PctOfTotal12FROM Sales.SalesOrderDetail sod13JOIN Production.Product p ON sod.ProductID = p.ProductID14JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID15JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID16GROUP BY pc.ProductCategoryID, pc.Name, ps.ProductSubcategoryID, ps.Name17ORDER BY pc.Name, Revenue DESC;4.3 Year-over-Year Growth
SQL
1WITH YearlySales AS (2 SELECT 3 pc.Name AS Category,4 YEAR(soh.OrderDate) AS Year,5 SUM(sod.LineTotal) AS Revenue6 FROM Sales.SalesOrderDetail sod7 JOIN Sales.SalesOrderHeader soh ON sod.SalesOrderID = soh.SalesOrderID8 JOIN Production.Product p ON sod.ProductID = p.ProductID9 JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID10 JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID11 GROUP BY pc.ProductCategoryID, pc.Name, YEAR(soh.OrderDate)12)13SELECT 14 Category,15 Year,16 Revenue,17 LAG(Revenue) OVER (PARTITION BY Category ORDER BY Year) AS PrevYearRevenue,18 CASE 19 WHEN LAG(Revenue) OVER (PARTITION BY Category ORDER BY Year) IS NULL THEN NULL20 ELSE CAST((Revenue - LAG(Revenue) OVER (PARTITION BY Category ORDER BY Year)) * 100.0 /21 LAG(Revenue) OVER (PARTITION BY Category ORDER BY Year) AS DECIMAL(10,2))22 END AS YoY_Growth_Pct23FROM YearlySales24ORDER BY Category, Year;4.4 Cumulative Distribution
SQL
1-- Customer spending distribution2WITH CustomerSpending AS (3 SELECT 4 c.CustomerID,5 SUM(soh.TotalDue) AS TotalSpent6 FROM Sales.Customer c7 JOIN Sales.SalesOrderHeader soh ON c.CustomerID = soh.CustomerID8 GROUP BY c.CustomerID9)10SELECT 11 CustomerID,12 TotalSpent,13 NTILE(100) OVER (ORDER BY TotalSpent) AS Percentile,14 CUME_DIST() OVER (ORDER BY TotalSpent) AS CumulativeDistribution,15 PERCENT_RANK() OVER (ORDER BY TotalSpent) AS PercentRank16FROM CustomerSpending17ORDER BY TotalSpent DESC;5. Statistical Functions
5.1 Basic Statistics
SQL
1SELECT 2 pc.Name AS Category,3 COUNT(*) AS ProductCount,4 AVG(p.ListPrice) AS AvgPrice,5 STDEV(p.ListPrice) AS StdDevPrice,6 VAR(p.ListPrice) AS VariancePrice,7 MIN(p.ListPrice) AS MinPrice,8 MAX(p.ListPrice) AS MaxPrice9FROM Production.Product p10JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID11JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID12WHERE p.ListPrice > 013GROUP BY pc.ProductCategoryID, pc.Name14ORDER BY AvgPrice DESC;5.2 Median Calculation
SQL
1-- Median using PERCENTILE_CONT2SELECT DISTINCT3 pc.Name AS Category,4 PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY p.ListPrice) 5 OVER (PARTITION BY pc.ProductCategoryID) AS MedianPrice,6 AVG(p.ListPrice) OVER (PARTITION BY pc.ProductCategoryID) AS AvgPrice7FROM Production.Product p8JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID9JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID10WHERE p.ListPrice > 0;5.3 Correlation Analysis (Manual)
SQL
1-- Correlation between ListPrice and OrderQuantity2WITH ProductStats AS (3 SELECT 4 p.ProductID,5 p.ListPrice,6 AVG(sod.OrderQty * 1.0) AS AvgOrderQty7 FROM Production.Product p8 JOIN Sales.SalesOrderDetail sod ON p.ProductID = sod.ProductID9 GROUP BY p.ProductID, p.ListPrice10),11Calculations AS (12 SELECT 13 COUNT(*) AS N,14 SUM(ListPrice) AS SumX,15 SUM(AvgOrderQty) AS SumY,16 SUM(ListPrice * AvgOrderQty) AS SumXY,17 SUM(ListPrice * ListPrice) AS SumX2,18 SUM(AvgOrderQty * AvgOrderQty) AS SumY219 FROM ProductStats20)21SELECT 22 (N * SumXY - SumX * SumY) / 23 SQRT((N * SumX2 - SumX * SumX) * (N * SumY2 - SumY * SumY)) AS Correlation24FROM Calculations;6. Advanced Patterns
6.1 Gaps and Islands
SQL
1-- Find consecutive date ranges with sales2WITH SalesDates AS (3 SELECT DISTINCT CAST(OrderDate AS DATE) AS SaleDate4 FROM Sales.SalesOrderHeader5),6DateGroups AS (7 SELECT 8 SaleDate,9 DATEADD(DAY, -ROW_NUMBER() OVER (ORDER BY SaleDate), SaleDate) AS GroupDate10 FROM SalesDates11)12SELECT 13 MIN(SaleDate) AS PeriodStart,14 MAX(SaleDate) AS PeriodEnd,15 DATEDIFF(DAY, MIN(SaleDate), MAX(SaleDate)) + 1 AS DaysInPeriod16FROM DateGroups17GROUP BY GroupDate18ORDER BY PeriodStart;6.2 First/Last per Group
SQL
1-- First and last order per customer2WITH OrderRanks AS (3 SELECT 4 CustomerID,5 SalesOrderID,6 OrderDate,7 TotalDue,8 ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS FirstRank,9 ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) AS LastRank10 FROM Sales.SalesOrderHeader11)12SELECT 13 f.CustomerID,14 f.SalesOrderID AS FirstOrderID,15 f.OrderDate AS FirstOrderDate,16 f.TotalDue AS FirstOrderAmount,17 l.SalesOrderID AS LastOrderID,18 l.OrderDate AS LastOrderDate,19 l.TotalDue AS LastOrderAmount,20 DATEDIFF(DAY, f.OrderDate, l.OrderDate) AS CustomerLifespanDays21FROM OrderRanks f22JOIN OrderRanks l ON f.CustomerID = l.CustomerID23WHERE f.FirstRank = 1 AND l.LastRank = 124ORDER BY CustomerLifespanDays DESC;6.3 Sessionization
SQL
1-- Group customer orders into sessions (orders within 30 days)2WITH OrdersWithPrev AS (3 SELECT 4 CustomerID,5 SalesOrderID,6 OrderDate,7 LAG(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS PrevOrderDate,8 CASE 9 WHEN DATEDIFF(DAY, 10 LAG(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate), 11 OrderDate) > 30 12 OR LAG(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) IS NULL13 THEN 1 ELSE 0 14 END AS NewSession15 FROM Sales.SalesOrderHeader16),17SessionGroups AS (18 SELECT 19 *,20 SUM(NewSession) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS SessionID21 FROM OrdersWithPrev22)23SELECT 24 CustomerID,25 SessionID,26 COUNT(*) AS OrdersInSession,27 MIN(OrderDate) AS SessionStart,28 MAX(OrderDate) AS SessionEnd29FROM SessionGroups30GROUP BY CustomerID, SessionID31HAVING COUNT(*) > 132ORDER BY CustomerID, SessionID;7. Thực hành
Exercises
Exercise 1: PIVOT Sales Report
SQL
1-- Tạo pivot report: Revenue theo Category (rows) và Quarter (columns) cho năm 201323-- YOUR CODE HERE💡 Xem đáp án
SQL
1SELECT Category, [1] AS Q1, [2] AS Q2, [3] AS Q3, [4] AS Q42FROM (3 SELECT 4 pc.Name AS Category,5 DATEPART(QUARTER, soh.OrderDate) AS Quarter,6 sod.LineTotal7 FROM Sales.SalesOrderDetail sod8 JOIN Sales.SalesOrderHeader soh ON sod.SalesOrderID = soh.SalesOrderID9 JOIN Production.Product p ON sod.ProductID = p.ProductID10 JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID11 JOIN Production.ProductCategory pc ON ps.ProductCategoryID = pc.ProductCategoryID12 WHERE YEAR(soh.OrderDate) = 201313) AS SourceData14PIVOT (15 SUM(LineTotal)16 FOR Quarter IN ([1], [2], [3], [4])17) AS PivotTable18ORDER BY Category;Exercise 2: ROLLUP Report
SQL
1-- Tạo report với subtotals: Territory → Year → Revenue2-- Include subtotals per territory và grand total34-- YOUR CODE HERE💡 Xem đáp án
SQL
1SELECT 2 COALESCE(st.Name, 'ALL TERRITORIES') AS Territory,3 COALESCE(CAST(YEAR(soh.OrderDate) AS VARCHAR), 'Subtotal') AS Year,4 SUM(soh.TotalDue) AS Revenue,5 CASE 6 WHEN GROUPING(st.Name) = 1 AND GROUPING(YEAR(soh.OrderDate)) = 1 THEN 'Grand Total'7 WHEN GROUPING(YEAR(soh.OrderDate)) = 1 THEN 'Territory Total'8 ELSE 'Detail'9 END AS RowType10FROM Sales.SalesOrderHeader soh11JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID12GROUP BY ROLLUP (st.Name, YEAR(soh.OrderDate))13ORDER BY 14 GROUPING(st.Name), st.Name,15 GROUPING(YEAR(soh.OrderDate)), Year;Exercise 3: YoY Analysis
SQL
1-- Tính YoY growth % cho mỗi Territory2-- Hiển thị: Territory, Year, Revenue, PrevYearRevenue, GrowthPct34-- YOUR CODE HERE💡 Xem đáp án
SQL
1WITH TerritoryYearlySales AS (2 SELECT 3 st.Name AS Territory,4 YEAR(soh.OrderDate) AS Year,5 SUM(soh.TotalDue) AS Revenue6 FROM Sales.SalesOrderHeader soh7 JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID8 GROUP BY st.TerritoryID, st.Name, YEAR(soh.OrderDate)9)10SELECT 11 Territory,12 Year,13 Revenue,14 LAG(Revenue) OVER (PARTITION BY Territory ORDER BY Year) AS PrevYearRevenue,15 CASE 16 WHEN LAG(Revenue) OVER (PARTITION BY Territory ORDER BY Year) IS NULL THEN NULL17 ELSE CAST((Revenue - LAG(Revenue) OVER (PARTITION BY Territory ORDER BY Year)) * 100.0 /18 LAG(Revenue) OVER (PARTITION BY Territory ORDER BY Year) AS DECIMAL(10,2))19 END AS GrowthPct20FROM TerritoryYearlySales21ORDER BY Territory, Year;8. Tổng kết
| Feature | Purpose | Use Case |
|---|---|---|
| PIVOT | Rows → Columns | Cross-tabulation reports |
| UNPIVOT | Columns → Rows | Normalize denormalized data |
| ROLLUP | Hierarchical subtotals | Category → Subcategory totals |
| CUBE | All combinations | Multi-dimensional analysis |
| GROUPING SETS | Custom subtotals | Specific grouping combinations |
Best Practices:
- PIVOT cần biết trước column values (hoặc dùng dynamic SQL)
- GROUPING function để identify subtotal rows
- Window functions often cleaner than complex GROUP BY
- Consider materialized views cho complex analytics
Bài tiếp theo: Final Project - HR Analytics
