Bài 12: Capstone — Báo cáo Phân tích Tổng hợp ShopVN
Mục tiêu Capstone
🎯 Mục tiêu dự án
Trong bài capstone này, bạn sẽ đóng vai Senior Data Analyst tại ShopVN và xây dựng một báo cáo phân tích toàn diện trình bày cho ban lãnh đạo. Báo cáo này tổng hợp tất cả kỹ thuật đã học trong khóa:
- Business Overview — Tổng quan hoạt động kinh doanh
- Customer Intelligence — Phân khúc & hành vi khách hàng
- Product & Seller Performance — Hiệu suất sản phẩm và seller
- Growth Opportunities — Cơ hội tăng trưởng và recommendations
Capstone Project Flow
Đây là project hoàn chỉnh bạn có thể đưa vào portfolio khi đi xin việc. Hãy làm thật kỹ và trình bày chuyên nghiệp!
Phần 1: Business Overview
📊 Section 1: Tổng quan Kinh doanh ShopVN
1.1 Executive Summary KPIs
1-- Executive Summary: Key Business Metrics2SELECT 3 -- Revenue Metrics4 SUM(total_amount) AS total_gmv,5 SUM(CASE WHEN order_status = 'delivered' 6 THEN total_amount ELSE 0 END) AS net_revenue,7 COUNT(DISTINCT order_id) AS total_orders,8 ROUND(AVG(total_amount), 0) AS avg_order_value,9 10 -- Customer Metrics11 COUNT(DISTINCT customer_id) AS total_customers,12 COUNT(DISTINCT CASE 13 WHEN order_date >= CURRENT_DATE - INTERVAL '90 days' 14 THEN customer_id END) AS active_customers_90d,15 16 -- Operational Metrics17 ROUND(18 COUNT(CASE WHEN order_status = 'delivered' THEN 1 END) * 100.0 / 19 COUNT(*), 120 ) AS fulfillment_rate,21 ROUND(22 COUNT(CASE WHEN order_status = 'cancelled' THEN 1 END) * 100.0 / 23 COUNT(*), 124 ) AS cancel_rate2526FROM fact_orders;1.2 Revenue Trend & Seasonality
1-- Monthly Revenue Trend with YoY comparison2WITH monthly AS (3 SELECT 4 DATE_TRUNC('month', order_date) AS month,5 EXTRACT(YEAR FROM order_date) AS year,6 EXTRACT(MONTH FROM order_date) AS month_num,7 SUM(total_amount) AS gmv,8 COUNT(DISTINCT order_id) AS orders,9 COUNT(DISTINCT customer_id) AS customers,10 ROUND(AVG(total_amount), 0) AS aov11 FROM fact_orders12 WHERE order_status != 'cancelled'13 GROUP BY DATE_TRUNC('month', order_date),14 EXTRACT(YEAR FROM order_date),15 EXTRACT(MONTH FROM order_date)16)17SELECT 18 m1.month,19 m1.gmv AS current_gmv,20 m2.gmv AS prev_year_gmv,21 ROUND((m1.gmv - COALESCE(m2.gmv, 0)) * 100.0 / 22 NULLIF(m2.gmv, 0), 1) AS yoy_growth,23 m1.orders,24 m1.customers,25 m1.aov26FROM monthly m127LEFT JOIN monthly m2 28 ON m1.month_num = m2.month_num 29 AND m1.year = m2.year + 130ORDER BY m1.month;1.3 Category Performance Matrix
1-- Category Revenue & Growth Matrix2WITH category_periods AS (3 SELECT 4 p.category,5 SUM(CASE WHEN o.order_date >= CURRENT_DATE - INTERVAL '6 months'6 THEN oi.price * oi.quantity ELSE 0 END) AS recent_revenue,7 SUM(CASE WHEN o.order_date < CURRENT_DATE - INTERVAL '6 months'8 THEN oi.price * oi.quantity ELSE 0 END) AS older_revenue,9 SUM(oi.price * oi.quantity) AS total_revenue,10 COUNT(DISTINCT oi.order_id) AS total_orders,11 ROUND(AVG(r.review_score), 2) AS avg_rating12 FROM fact_order_items oi13 JOIN dim_product p ON oi.product_id = p.product_id14 JOIN fact_orders o ON oi.order_id = o.order_id15 LEFT JOIN fact_reviews r ON oi.order_id = r.order_id 16 AND oi.product_id = r.product_id17 WHERE o.order_status = 'delivered'18 GROUP BY p.category19)20SELECT 21 category,22 total_revenue,23 ROUND(total_revenue * 100.0 / SUM(total_revenue) OVER(), 1) AS share_pct,24 ROUND(25 (recent_revenue - older_revenue) * 100.0 / 26 NULLIF(older_revenue, 0), 127 ) AS growth_pct,28 total_orders,29 avg_rating,30 -- BCG Matrix classification31 CASE 32 WHEN total_revenue * 100.0 / SUM(total_revenue) OVER() > 15 33 AND (recent_revenue - older_revenue) * 100.0 / NULLIF(older_revenue, 0) > 10 34 THEN '⭐ Star'35 WHEN total_revenue * 100.0 / SUM(total_revenue) OVER() > 15 36 THEN '🐄 Cash Cow'37 WHEN (recent_revenue - older_revenue) * 100.0 / NULLIF(older_revenue, 0) > 10 38 THEN '❓ Question Mark'39 ELSE '🐕 Dog'40 END AS bcg_position41FROM category_periods42ORDER BY total_revenue DESC;Phần 2: Customer Intelligence
👥 Section 2: Customer Intelligence Report
2.1 Customer Segmentation (RFM)
1-- RFM Segmentation Summary2WITH rfm_scores AS (3 SELECT 4 customer_id,5 NTILE(4) OVER(ORDER BY MAX(order_date) DESC) AS R,6 NTILE(4) OVER(ORDER BY COUNT(DISTINCT order_id)) AS F,7 NTILE(4) OVER(ORDER BY SUM(total_amount)) AS M8 FROM fact_orders9 WHERE order_status = 'delivered'10 GROUP BY customer_id11),12segments AS (13 SELECT 14 customer_id, R, F, M,15 CASE 16 WHEN R <= 2 AND F >= 3 AND M >= 3 THEN 'Champions'17 WHEN R <= 2 AND F >= 3 THEN 'Loyal Customers'18 WHEN R <= 2 AND F = 2 THEN 'Potential Loyalist'19 WHEN R <= 2 AND F = 1 THEN 'New Customers'20 WHEN R = 3 AND F >= 2 THEN 'At Risk'21 WHEN R >= 3 AND F >= 3 THEN 'Cant Lose Them'22 WHEN R = 3 AND F = 1 THEN 'About to Sleep'23 WHEN R = 4 AND F >= 2 THEN 'Hibernating'24 ELSE 'Lost'25 END AS segment26 FROM rfm_scores27)28SELECT 29 segment,30 COUNT(*) AS customer_count,31 ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) AS pct32FROM segments33GROUP BY segment34ORDER BY customer_count DESC;2.2 Cohort Retention Matrix
1-- Cohort Retention (first 6 months)2WITH first_order AS (3 SELECT 4 customer_id,5 DATE_TRUNC('month', MIN(order_date)) AS cohort_month6 FROM fact_orders7 WHERE order_status = 'delivered'8 GROUP BY customer_id9),10customer_activity AS (11 SELECT 12 f.customer_id,13 f.cohort_month,14 DATE_TRUNC('month', o.order_date) AS activity_month,15 (EXTRACT(YEAR FROM DATE_TRUNC('month', o.order_date)) - 16 EXTRACT(YEAR FROM f.cohort_month)) * 12 +17 (EXTRACT(MONTH FROM DATE_TRUNC('month', o.order_date)) - 18 EXTRACT(MONTH FROM f.cohort_month)) AS month_number19 FROM first_order f20 JOIN fact_orders o ON f.customer_id = o.customer_id21 WHERE o.order_status = 'delivered'22)23SELECT 24 cohort_month,25 COUNT(DISTINCT CASE WHEN month_number = 0 THEN customer_id END) AS m0,26 ROUND(COUNT(DISTINCT CASE WHEN month_number = 1 THEN customer_id END) * 100.0 / 27 NULLIF(COUNT(DISTINCT CASE WHEN month_number = 0 THEN customer_id END), 0), 1) AS m1_pct,28 ROUND(COUNT(DISTINCT CASE WHEN month_number = 2 THEN customer_id END) * 100.0 / 29 NULLIF(COUNT(DISTINCT CASE WHEN month_number = 0 THEN customer_id END), 0), 1) AS m2_pct,30 ROUND(COUNT(DISTINCT CASE WHEN month_number = 3 THEN customer_id END) * 100.0 / 31 NULLIF(COUNT(DISTINCT CASE WHEN month_number = 0 THEN customer_id END), 0), 1) AS m3_pct,32 ROUND(COUNT(DISTINCT CASE WHEN month_number = 6 THEN customer_id END) * 100.0 / 33 NULLIF(COUNT(DISTINCT CASE WHEN month_number = 0 THEN customer_id END), 0), 1) AS m6_pct34FROM customer_activity35GROUP BY cohort_month36ORDER BY cohort_month;2.3 Customer Lifetime Value
1-- CLV by Customer Segment2WITH customer_metrics AS (3 SELECT 4 customer_id,5 COUNT(DISTINCT order_id) AS total_orders,6 SUM(total_amount) AS total_spend,7 MIN(order_date) AS first_order,8 MAX(order_date) AS last_order,9 EXTRACT(DAY FROM MAX(order_date) - MIN(order_date)) / 30.0 AS tenure_months10 FROM fact_orders11 WHERE order_status = 'delivered'12 GROUP BY customer_id13 HAVING COUNT(DISTINCT order_id) >= 214)15SELECT 16 CASE 17 WHEN total_orders >= 10 THEN 'VIP (10+ orders)'18 WHEN total_orders >= 5 THEN 'Regular (5-9)'19 WHEN total_orders >= 3 THEN 'Developing (3-4)'20 ELSE 'Starter (2)'21 END AS tier,22 COUNT(*) AS customers,23 ROUND(AVG(total_spend), 0) AS avg_clv,24 ROUND(AVG(total_orders), 1) AS avg_orders,25 ROUND(AVG(total_spend / total_orders), 0) AS avg_aov,26 ROUND(AVG(tenure_months), 1) AS avg_tenure_months27FROM customer_metrics28GROUP BY 29 CASE 30 WHEN total_orders >= 10 THEN 'VIP (10+ orders)'31 WHEN total_orders >= 5 THEN 'Regular (5-9)'32 WHEN total_orders >= 3 THEN 'Developing (3-4)'33 ELSE 'Starter (2)'34 END35ORDER BY avg_clv DESC;Phần 3: Product & Seller
🏷️ Section 3: Product & Seller Performance
3.1 Product Portfolio Analysis (ABC + Rating)
1-- Product ABC Classification with Review Quality2WITH product_revenue AS (3 SELECT 4 p.product_id,5 p.product_name,6 p.category,7 SUM(oi.price * oi.quantity) AS revenue,8 COUNT(DISTINCT oi.order_id) AS order_count,9 ROUND(AVG(r.review_score), 2) AS avg_rating,10 COUNT(r.review_id) AS review_count11 FROM fact_order_items oi12 JOIN dim_product p ON oi.product_id = p.product_id13 JOIN fact_orders o ON oi.order_id = o.order_id14 LEFT JOIN fact_reviews r ON oi.order_id = r.order_id 15 AND oi.product_id = r.product_id16 WHERE o.order_status = 'delivered'17 GROUP BY p.product_id, p.product_name, p.category18),19abc AS (20 SELECT 21 *,22 SUM(revenue) OVER(ORDER BY revenue DESC) * 100.0 / 23 SUM(revenue) OVER() AS cumulative_pct,24 CASE 25 WHEN SUM(revenue) OVER(ORDER BY revenue DESC) * 100.0 / 26 SUM(revenue) OVER() <= 80 THEN 'A'27 WHEN SUM(revenue) OVER(ORDER BY revenue DESC) * 100.0 / 28 SUM(revenue) OVER() <= 95 THEN 'B'29 ELSE 'C'30 END AS abc_class31 FROM product_revenue32)33SELECT 34 abc_class,35 COUNT(*) AS product_count,36 ROUND(SUM(revenue), 0) AS total_revenue,37 ROUND(SUM(revenue) * 100.0 / SUM(SUM(revenue)) OVER(), 1) AS revenue_share,38 ROUND(AVG(avg_rating), 2) AS avg_rating,39 ROUND(AVG(order_count), 0) AS avg_orders40FROM abc41GROUP BY abc_class42ORDER BY abc_class;3.2 Seller Tier Analysis
1-- Comprehensive Seller Performance Tiers2WITH seller_metrics AS (3 SELECT 4 s.seller_id,5 s.seller_name,6 s.seller_type,7 COUNT(DISTINCT oi.order_id) AS total_orders,8 SUM(oi.price * oi.quantity) AS total_revenue,9 ROUND(AVG(r.review_score), 2) AS avg_rating,10 ROUND(AVG(r.delivery_rating), 2) AS avg_delivery_rating,11 ROUND(12 COUNT(DISTINCT CASE WHEN o.order_status = 'cancelled' 13 THEN o.order_id END) * 100.0 / 14 NULLIF(COUNT(DISTINCT o.order_id), 0), 115 ) AS cancel_rate16 FROM dim_seller s17 JOIN fact_order_items oi ON s.seller_id = oi.seller_id18 JOIN fact_orders o ON oi.order_id = o.order_id19 LEFT JOIN fact_reviews r ON oi.order_id = r.order_id 20 AND oi.seller_id = r.seller_id21 GROUP BY s.seller_id, s.seller_name, s.seller_type22)23SELECT 24 CASE 25 WHEN avg_rating >= 4.2 AND cancel_rate < 5 AND total_orders >= 100 26 THEN '🏆 Gold Seller'27 WHEN avg_rating >= 3.8 AND cancel_rate < 10 28 THEN '🥈 Silver Seller'29 WHEN avg_rating >= 3.5 AND cancel_rate < 15 30 THEN '🥉 Bronze Seller'31 ELSE '⚠️ Under Review'32 END AS seller_tier,33 COUNT(*) AS seller_count,34 ROUND(AVG(total_revenue), 0) AS avg_revenue,35 ROUND(AVG(avg_rating), 2) AS avg_rating,36 ROUND(AVG(cancel_rate), 1) AS avg_cancel_rate37FROM seller_metrics38GROUP BY 39 CASE 40 WHEN avg_rating >= 4.2 AND cancel_rate < 5 AND total_orders >= 100 41 THEN '🏆 Gold Seller'42 WHEN avg_rating >= 3.8 AND cancel_rate < 10 43 THEN '🥈 Silver Seller'44 WHEN avg_rating >= 3.5 AND cancel_rate < 15 45 THEN '🥉 Bronze Seller'46 ELSE '⚠️ Under Review'47 END48ORDER BY avg_revenue DESC;Phần 4: Growth & Recommendations
🚀 Section 4: Growth Opportunities & Recommendations
4.1 Conversion Funnel Summary
1-- Full Funnel Conversion Summary2SELECT 3 'Page Views' AS stage,4 COUNT(DISTINCT session_id) AS count,5 100.0 AS pct6FROM fact_page_views78UNION ALL910SELECT 11 'Product Views',12 COUNT(DISTINCT session_id),13 ROUND(COUNT(DISTINCT session_id) * 100.0 / 14 (SELECT COUNT(DISTINCT session_id) FROM fact_page_views), 1)15FROM fact_page_views16WHERE page_type = 'product'1718UNION ALL1920SELECT 21 'Add to Cart',22 COUNT(DISTINCT session_id),23 ROUND(COUNT(DISTINCT session_id) * 100.0 / 24 (SELECT COUNT(DISTINCT session_id) FROM fact_page_views), 1)25FROM fact_page_views26WHERE page_type = 'cart'2728UNION ALL2930SELECT 31 'Checkout',32 COUNT(DISTINCT session_id),33 ROUND(COUNT(DISTINCT session_id) * 100.0 / 34 (SELECT COUNT(DISTINCT session_id) FROM fact_page_views), 1)35FROM fact_page_views36WHERE page_type = 'checkout'3738UNION ALL3940SELECT 41 'Purchase',42 COUNT(DISTINCT order_id),43 ROUND(COUNT(DISTINCT order_id) * 100.0 / 44 (SELECT COUNT(DISTINCT session_id) FROM fact_page_views), 1)45FROM fact_orders46WHERE order_status != 'cancelled';4.2 Growth Opportunity Matrix
Growth Opportunities
4.3 Recommendation Template
Mỗi recommendation trong báo cáo nên theo format STAR:
| Element | Mô tả | Ví dụ |
|---|---|---|
| Situation | Tình hình hiện tại | Cancel rate đang ở 8%, cao hơn benchmark 5% |
| Task | Mục tiêu cần đạt | Giảm cancel rate xuống dưới 5% trong Q3 |
| Action | Hành động cụ thể | Implement inventory check real-time, SMS confirmation |
| Result | Kết quả kỳ vọng | Giảm 3% cancel = +15B VND revenue/năm |
Luôn quantify impact cho mỗi recommendation. Đừng chỉ nói "cần cải thiện retention" — hãy nói "tăng M1 retention từ 25% lên 30% sẽ mang lại thêm ~50,000 returning customers và ~35B VND revenue/năm".
Cấu trúc báo cáo & Deliverables
📋 Deliverables của Capstone
Cấu trúc báo cáo đề xuất
Report Structure
Checklist hoàn thành
| # | Deliverable | Nội dung |
|---|---|---|
| 1 | SQL Queries | Tất cả queries đã chạy thành công, có comments |
| 2 | Executive Dashboard | 1 dashboard với 6-8 KPI charts |
| 3 | Deep-dive Analysis | RFM segments, Cohort table, Funnel conversion |
| 4 | Written Report | 12-15 slides/pages với insights & visualization |
| 5 | Recommendations | 3-5 data-driven recommendations với ROI estimation |
Capstone không chỉ là chạy SQL — điểm khác biệt nằm ở storytelling và business recommendations. Hãy viết như bạn đang trình bày cho CEO, không phải cho technical team.
🎯 Tiêu chí đánh giá
| Tiêu chí | Trọng số | Mô tả |
|---|---|---|
| Technical Accuracy | 25% | SQL đúng, data hợp lý, không lỗi logic |
| Analytical Depth | 25% | Phân tích sâu, không chỉ mô tả bề mặt |
| Business Relevance | 25% | Insights actionable, relevant cho stakeholders |
| Presentation | 15% | Dashboard đẹp, report chuyên nghiệp |
| Recommendations | 10% | Cụ thể, khả thi, có quantified impact |
Tổng kết khóa học
🎓 Chúc mừng hoàn thành khóa học!
Bạn đã học được:
| Module | Kỹ năng | Ứng dụng thực tế |
|---|---|---|
| Module 1 | E-commerce fundamentals, KPI framework | Hiểu business model & đo lường |
| Module 2 | Funnel, RFM, Cohort analysis | Customer segmentation & retention |
| Module 3 | Product, Seller, Marketing analytics | Product optimization & growth |
| Module 4 | Review analysis, Dashboard design | Quality monitoring & reporting |
🔗 Bước tiếp theo:
- Làm bài Quiz để kiểm tra kiến thức: Quiz E-commerce Analytics →
- Khám phá khóa tiếp theo: DA cho Retail & Chuỗi cửa hàng →
- Tham gia Challenge: Áp dụng kỹ năng vào Data Challenges →
- Xây dựng Portfolio: Hoàn thiện capstone project và đưa lên GitHub
E-commerce Data Analyst là một trong những vị trí hot nhất tại Việt Nam hiện nay. Với portfolio gồm capstone project + SQL queries + dashboard, bạn đã có nền tảng vững chắc để ứng tuyển tại Shopee, Lazada, Tiki, Sendo và nhiều startup e-commerce khác!
