MinAI - Về trang chủ
Dự án
12/1390 phút
Đang tải...

Bài 12: Capstone — Báo cáo Phân tích Tổng hợp ShopVN

Dự án tổng hợp: Xây dựng báo cáo phân tích toàn diện cho ShopVN Marketplace, kết hợp tất cả kỹ thuật đã học từ funnel, RFM, cohort đến review analysis

Bài 12: Capstone — Báo cáo Phân tích Tổng hợp ShopVN

task-0

Mục tiêu Capstone

TB5 min

🎯 Mục tiêu dự án

Trong bài capstone này, bạn sẽ đóng vai Senior Data Analyst tại ShopVN và xây dựng một báo cáo phân tích toàn diện trình bày cho ban lãnh đạo. Báo cáo này tổng hợp tất cả kỹ thuật đã học trong khóa:

  1. Business Overview — Tổng quan hoạt động kinh doanh
  2. Customer Intelligence — Phân khúc & hành vi khách hàng
  3. Product & Seller Performance — Hiệu suất sản phẩm và seller
  4. Growth Opportunities — Cơ hội tăng trưởng và recommendations

Capstone Project Flow

📊 Khám phá Data
🔍 Phân tích 4 Modules
💡 Rút ra Insights
📋 Viết báo cáo & Dashboard
🎤 Trình bày & Recommend
Portfolio Project

Đây là project hoàn chỉnh bạn có thể đưa vào portfolio khi đi xin việc. Hãy làm thật kỹ và trình bày chuyên nghiệp!

task-1

Phần 1: Business Overview

TB5 min

📊 Section 1: Tổng quan Kinh doanh ShopVN

1.1 Executive Summary KPIs

SQL
1-- Executive Summary: Key Business Metrics
2SELECT
3 -- Revenue Metrics
4 SUM(total_amount) AS total_gmv,
5 SUM(CASE WHEN order_status = 'delivered'
6 THEN total_amount ELSE 0 END) AS net_revenue,
7 COUNT(DISTINCT order_id) AS total_orders,
8 ROUND(AVG(total_amount), 0) AS avg_order_value,
9
10 -- Customer Metrics
11 COUNT(DISTINCT customer_id) AS total_customers,
12 COUNT(DISTINCT CASE
13 WHEN order_date >= CURRENT_DATE - INTERVAL '90 days'
14 THEN customer_id END) AS active_customers_90d,
15
16 -- Operational Metrics
17 ROUND(
18 COUNT(CASE WHEN order_status = 'delivered' THEN 1 END) * 100.0 /
19 COUNT(*), 1
20 ) AS fulfillment_rate,
21 ROUND(
22 COUNT(CASE WHEN order_status = 'cancelled' THEN 1 END) * 100.0 /
23 COUNT(*), 1
24 ) AS cancel_rate
25
26FROM fact_orders;

1.2 Revenue Trend & Seasonality

SQL
1-- Monthly Revenue Trend with YoY comparison
2WITH monthly AS (
3 SELECT
4 DATE_TRUNC('month', order_date) AS month,
5 EXTRACT(YEAR FROM order_date) AS year,
6 EXTRACT(MONTH FROM order_date) AS month_num,
7 SUM(total_amount) AS gmv,
8 COUNT(DISTINCT order_id) AS orders,
9 COUNT(DISTINCT customer_id) AS customers,
10 ROUND(AVG(total_amount), 0) AS aov
11 FROM fact_orders
12 WHERE order_status != 'cancelled'
13 GROUP BY DATE_TRUNC('month', order_date),
14 EXTRACT(YEAR FROM order_date),
15 EXTRACT(MONTH FROM order_date)
16)
17SELECT
18 m1.month,
19 m1.gmv AS current_gmv,
20 m2.gmv AS prev_year_gmv,
21 ROUND((m1.gmv - COALESCE(m2.gmv, 0)) * 100.0 /
22 NULLIF(m2.gmv, 0), 1) AS yoy_growth,
23 m1.orders,
24 m1.customers,
25 m1.aov
26FROM monthly m1
27LEFT JOIN monthly m2
28 ON m1.month_num = m2.month_num
29 AND m1.year = m2.year + 1
30ORDER BY m1.month;

1.3 Category Performance Matrix

SQL
1-- Category Revenue & Growth Matrix
2WITH category_periods AS (
3 SELECT
4 p.category,
5 SUM(CASE WHEN o.order_date >= CURRENT_DATE - INTERVAL '6 months'
6 THEN oi.price * oi.quantity ELSE 0 END) AS recent_revenue,
7 SUM(CASE WHEN o.order_date < CURRENT_DATE - INTERVAL '6 months'
8 THEN oi.price * oi.quantity ELSE 0 END) AS older_revenue,
9 SUM(oi.price * oi.quantity) AS total_revenue,
10 COUNT(DISTINCT oi.order_id) AS total_orders,
11 ROUND(AVG(r.review_score), 2) AS avg_rating
12 FROM fact_order_items oi
13 JOIN dim_product p ON oi.product_id = p.product_id
14 JOIN fact_orders o ON oi.order_id = o.order_id
15 LEFT JOIN fact_reviews r ON oi.order_id = r.order_id
16 AND oi.product_id = r.product_id
17 WHERE o.order_status = 'delivered'
18 GROUP BY p.category
19)
20SELECT
21 category,
22 total_revenue,
23 ROUND(total_revenue * 100.0 / SUM(total_revenue) OVER(), 1) AS share_pct,
24 ROUND(
25 (recent_revenue - older_revenue) * 100.0 /
26 NULLIF(older_revenue, 0), 1
27 ) AS growth_pct,
28 total_orders,
29 avg_rating,
30 -- BCG Matrix classification
31 CASE
32 WHEN total_revenue * 100.0 / SUM(total_revenue) OVER() > 15
33 AND (recent_revenue - older_revenue) * 100.0 / NULLIF(older_revenue, 0) > 10
34 THEN '⭐ Star'
35 WHEN total_revenue * 100.0 / SUM(total_revenue) OVER() > 15
36 THEN '🐄 Cash Cow'
37 WHEN (recent_revenue - older_revenue) * 100.0 / NULLIF(older_revenue, 0) > 10
38 THEN '❓ Question Mark'
39 ELSE '🐕 Dog'
40 END AS bcg_position
41FROM category_periods
42ORDER BY total_revenue DESC;
task-2

Phần 2: Customer Intelligence

TB5 min

👥 Section 2: Customer Intelligence Report

2.1 Customer Segmentation (RFM)

SQL
1-- RFM Segmentation Summary
2WITH rfm_scores AS (
3 SELECT
4 customer_id,
5 NTILE(4) OVER(ORDER BY MAX(order_date) DESC) AS R,
6 NTILE(4) OVER(ORDER BY COUNT(DISTINCT order_id)) AS F,
7 NTILE(4) OVER(ORDER BY SUM(total_amount)) AS M
8 FROM fact_orders
9 WHERE order_status = 'delivered'
10 GROUP BY customer_id
11),
12segments AS (
13 SELECT
14 customer_id, R, F, M,
15 CASE
16 WHEN R <= 2 AND F >= 3 AND M >= 3 THEN 'Champions'
17 WHEN R <= 2 AND F >= 3 THEN 'Loyal Customers'
18 WHEN R <= 2 AND F = 2 THEN 'Potential Loyalist'
19 WHEN R <= 2 AND F = 1 THEN 'New Customers'
20 WHEN R = 3 AND F >= 2 THEN 'At Risk'
21 WHEN R >= 3 AND F >= 3 THEN 'Cant Lose Them'
22 WHEN R = 3 AND F = 1 THEN 'About to Sleep'
23 WHEN R = 4 AND F >= 2 THEN 'Hibernating'
24 ELSE 'Lost'
25 END AS segment
26 FROM rfm_scores
27)
28SELECT
29 segment,
30 COUNT(*) AS customer_count,
31 ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 1) AS pct
32FROM segments
33GROUP BY segment
34ORDER BY customer_count DESC;

2.2 Cohort Retention Matrix

SQL
1-- Cohort Retention (first 6 months)
2WITH first_order AS (
3 SELECT
4 customer_id,
5 DATE_TRUNC('month', MIN(order_date)) AS cohort_month
6 FROM fact_orders
7 WHERE order_status = 'delivered'
8 GROUP BY customer_id
9),
10customer_activity AS (
11 SELECT
12 f.customer_id,
13 f.cohort_month,
14 DATE_TRUNC('month', o.order_date) AS activity_month,
15 (EXTRACT(YEAR FROM DATE_TRUNC('month', o.order_date)) -
16 EXTRACT(YEAR FROM f.cohort_month)) * 12 +
17 (EXTRACT(MONTH FROM DATE_TRUNC('month', o.order_date)) -
18 EXTRACT(MONTH FROM f.cohort_month)) AS month_number
19 FROM first_order f
20 JOIN fact_orders o ON f.customer_id = o.customer_id
21 WHERE o.order_status = 'delivered'
22)
23SELECT
24 cohort_month,
25 COUNT(DISTINCT CASE WHEN month_number = 0 THEN customer_id END) AS m0,
26 ROUND(COUNT(DISTINCT CASE WHEN month_number = 1 THEN customer_id END) * 100.0 /
27 NULLIF(COUNT(DISTINCT CASE WHEN month_number = 0 THEN customer_id END), 0), 1) AS m1_pct,
28 ROUND(COUNT(DISTINCT CASE WHEN month_number = 2 THEN customer_id END) * 100.0 /
29 NULLIF(COUNT(DISTINCT CASE WHEN month_number = 0 THEN customer_id END), 0), 1) AS m2_pct,
30 ROUND(COUNT(DISTINCT CASE WHEN month_number = 3 THEN customer_id END) * 100.0 /
31 NULLIF(COUNT(DISTINCT CASE WHEN month_number = 0 THEN customer_id END), 0), 1) AS m3_pct,
32 ROUND(COUNT(DISTINCT CASE WHEN month_number = 6 THEN customer_id END) * 100.0 /
33 NULLIF(COUNT(DISTINCT CASE WHEN month_number = 0 THEN customer_id END), 0), 1) AS m6_pct
34FROM customer_activity
35GROUP BY cohort_month
36ORDER BY cohort_month;

2.3 Customer Lifetime Value

SQL
1-- CLV by Customer Segment
2WITH customer_metrics AS (
3 SELECT
4 customer_id,
5 COUNT(DISTINCT order_id) AS total_orders,
6 SUM(total_amount) AS total_spend,
7 MIN(order_date) AS first_order,
8 MAX(order_date) AS last_order,
9 EXTRACT(DAY FROM MAX(order_date) - MIN(order_date)) / 30.0 AS tenure_months
10 FROM fact_orders
11 WHERE order_status = 'delivered'
12 GROUP BY customer_id
13 HAVING COUNT(DISTINCT order_id) >= 2
14)
15SELECT
16 CASE
17 WHEN total_orders >= 10 THEN 'VIP (10+ orders)'
18 WHEN total_orders >= 5 THEN 'Regular (5-9)'
19 WHEN total_orders >= 3 THEN 'Developing (3-4)'
20 ELSE 'Starter (2)'
21 END AS tier,
22 COUNT(*) AS customers,
23 ROUND(AVG(total_spend), 0) AS avg_clv,
24 ROUND(AVG(total_orders), 1) AS avg_orders,
25 ROUND(AVG(total_spend / total_orders), 0) AS avg_aov,
26 ROUND(AVG(tenure_months), 1) AS avg_tenure_months
27FROM customer_metrics
28GROUP BY
29 CASE
30 WHEN total_orders >= 10 THEN 'VIP (10+ orders)'
31 WHEN total_orders >= 5 THEN 'Regular (5-9)'
32 WHEN total_orders >= 3 THEN 'Developing (3-4)'
33 ELSE 'Starter (2)'
34 END
35ORDER BY avg_clv DESC;
task-3

Phần 3: Product & Seller

TB5 min

🏷️ Section 3: Product & Seller Performance

3.1 Product Portfolio Analysis (ABC + Rating)

SQL
1-- Product ABC Classification with Review Quality
2WITH product_revenue AS (
3 SELECT
4 p.product_id,
5 p.product_name,
6 p.category,
7 SUM(oi.price * oi.quantity) AS revenue,
8 COUNT(DISTINCT oi.order_id) AS order_count,
9 ROUND(AVG(r.review_score), 2) AS avg_rating,
10 COUNT(r.review_id) AS review_count
11 FROM fact_order_items oi
12 JOIN dim_product p ON oi.product_id = p.product_id
13 JOIN fact_orders o ON oi.order_id = o.order_id
14 LEFT JOIN fact_reviews r ON oi.order_id = r.order_id
15 AND oi.product_id = r.product_id
16 WHERE o.order_status = 'delivered'
17 GROUP BY p.product_id, p.product_name, p.category
18),
19abc AS (
20 SELECT
21 *,
22 SUM(revenue) OVER(ORDER BY revenue DESC) * 100.0 /
23 SUM(revenue) OVER() AS cumulative_pct,
24 CASE
25 WHEN SUM(revenue) OVER(ORDER BY revenue DESC) * 100.0 /
26 SUM(revenue) OVER() <= 80 THEN 'A'
27 WHEN SUM(revenue) OVER(ORDER BY revenue DESC) * 100.0 /
28 SUM(revenue) OVER() <= 95 THEN 'B'
29 ELSE 'C'
30 END AS abc_class
31 FROM product_revenue
32)
33SELECT
34 abc_class,
35 COUNT(*) AS product_count,
36 ROUND(SUM(revenue), 0) AS total_revenue,
37 ROUND(SUM(revenue) * 100.0 / SUM(SUM(revenue)) OVER(), 1) AS revenue_share,
38 ROUND(AVG(avg_rating), 2) AS avg_rating,
39 ROUND(AVG(order_count), 0) AS avg_orders
40FROM abc
41GROUP BY abc_class
42ORDER BY abc_class;

3.2 Seller Tier Analysis

SQL
1-- Comprehensive Seller Performance Tiers
2WITH seller_metrics AS (
3 SELECT
4 s.seller_id,
5 s.seller_name,
6 s.seller_type,
7 COUNT(DISTINCT oi.order_id) AS total_orders,
8 SUM(oi.price * oi.quantity) AS total_revenue,
9 ROUND(AVG(r.review_score), 2) AS avg_rating,
10 ROUND(AVG(r.delivery_rating), 2) AS avg_delivery_rating,
11 ROUND(
12 COUNT(DISTINCT CASE WHEN o.order_status = 'cancelled'
13 THEN o.order_id END) * 100.0 /
14 NULLIF(COUNT(DISTINCT o.order_id), 0), 1
15 ) AS cancel_rate
16 FROM dim_seller s
17 JOIN fact_order_items oi ON s.seller_id = oi.seller_id
18 JOIN fact_orders o ON oi.order_id = o.order_id
19 LEFT JOIN fact_reviews r ON oi.order_id = r.order_id
20 AND oi.seller_id = r.seller_id
21 GROUP BY s.seller_id, s.seller_name, s.seller_type
22)
23SELECT
24 CASE
25 WHEN avg_rating >= 4.2 AND cancel_rate < 5 AND total_orders >= 100
26 THEN '🏆 Gold Seller'
27 WHEN avg_rating >= 3.8 AND cancel_rate < 10
28 THEN '🥈 Silver Seller'
29 WHEN avg_rating >= 3.5 AND cancel_rate < 15
30 THEN '🥉 Bronze Seller'
31 ELSE '⚠️ Under Review'
32 END AS seller_tier,
33 COUNT(*) AS seller_count,
34 ROUND(AVG(total_revenue), 0) AS avg_revenue,
35 ROUND(AVG(avg_rating), 2) AS avg_rating,
36 ROUND(AVG(cancel_rate), 1) AS avg_cancel_rate
37FROM seller_metrics
38GROUP BY
39 CASE
40 WHEN avg_rating >= 4.2 AND cancel_rate < 5 AND total_orders >= 100
41 THEN '🏆 Gold Seller'
42 WHEN avg_rating >= 3.8 AND cancel_rate < 10
43 THEN '🥈 Silver Seller'
44 WHEN avg_rating >= 3.5 AND cancel_rate < 15
45 THEN '🥉 Bronze Seller'
46 ELSE '⚠️ Under Review'
47 END
48ORDER BY avg_revenue DESC;
task-4

Phần 4: Growth & Recommendations

TB5 min

🚀 Section 4: Growth Opportunities & Recommendations

4.1 Conversion Funnel Summary

SQL
1-- Full Funnel Conversion Summary
2SELECT
3 'Page Views' AS stage,
4 COUNT(DISTINCT session_id) AS count,
5 100.0 AS pct
6FROM fact_page_views
7
8UNION ALL
9
10SELECT
11 'Product Views',
12 COUNT(DISTINCT session_id),
13 ROUND(COUNT(DISTINCT session_id) * 100.0 /
14 (SELECT COUNT(DISTINCT session_id) FROM fact_page_views), 1)
15FROM fact_page_views
16WHERE page_type = 'product'
17
18UNION ALL
19
20SELECT
21 'Add to Cart',
22 COUNT(DISTINCT session_id),
23 ROUND(COUNT(DISTINCT session_id) * 100.0 /
24 (SELECT COUNT(DISTINCT session_id) FROM fact_page_views), 1)
25FROM fact_page_views
26WHERE page_type = 'cart'
27
28UNION ALL
29
30SELECT
31 'Checkout',
32 COUNT(DISTINCT session_id),
33 ROUND(COUNT(DISTINCT session_id) * 100.0 /
34 (SELECT COUNT(DISTINCT session_id) FROM fact_page_views), 1)
35FROM fact_page_views
36WHERE page_type = 'checkout'
37
38UNION ALL
39
40SELECT
41 'Purchase',
42 COUNT(DISTINCT order_id),
43 ROUND(COUNT(DISTINCT order_id) * 100.0 /
44 (SELECT COUNT(DISTINCT session_id) FROM fact_page_views), 1)
45FROM fact_orders
46WHERE order_status != 'cancelled';

4.2 Growth Opportunity Matrix

Growth Opportunities

🚀ShopVN Growth Plan
📈Revenue Growth
Tăng AOV (bundles)
Cross-sell top categories
Expand Star categories
Premium seller program
👥Customer Retention
Win-back At Risk
Loyalty program
Improve M1 retention
Personalized offers
⚙️Operations Optimize
Reduce cancel rate
Faster delivery
Seller quality control
Improve low ratings
🎯Conversion Optimize
Reduce cart abandon
Mobile UX improve
A/B test checkout
Better search/filter

4.3 Recommendation Template

Mỗi recommendation trong báo cáo nên theo format STAR:

ElementMô tảVí dụ
SituationTình hình hiện tạiCancel rate đang ở 8%, cao hơn benchmark 5%
TaskMục tiêu cần đạtGiảm cancel rate xuống dưới 5% trong Q3
ActionHành động cụ thểImplement inventory check real-time, SMS confirmation
ResultKết quả kỳ vọngGiảm 3% cancel = +15B VND revenue/năm
Pro Tip

Luôn quantify impact cho mỗi recommendation. Đừng chỉ nói "cần cải thiện retention" — hãy nói "tăng M1 retention từ 25% lên 30% sẽ mang lại thêm ~50,000 returning customers và ~35B VND revenue/năm".

task-5

Cấu trúc báo cáo & Deliverables

TB5 min

📋 Deliverables của Capstone

Cấu trúc báo cáo đề xuất

Report Structure

📑 Cover Page & Table of Contents
📊 Executive Summary (1 slide/page)
💰 Business Overview (3-4 slides)
👥 Customer Intelligence (3-4 slides)
🏷️ Product & Seller (2-3 slides)
🚀 Recommendations (2-3 slides)

Checklist hoàn thành

#DeliverableNội dung
1SQL QueriesTất cả queries đã chạy thành công, có comments
2Executive Dashboard1 dashboard với 6-8 KPI charts
3Deep-dive AnalysisRFM segments, Cohort table, Funnel conversion
4Written Report12-15 slides/pages với insights & visualization
5Recommendations3-5 data-driven recommendations với ROI estimation
Lưu ý quan trọng

Capstone không chỉ là chạy SQL — điểm khác biệt nằm ở storytellingbusiness recommendations. Hãy viết như bạn đang trình bày cho CEO, không phải cho technical team.

🎯 Tiêu chí đánh giá

Tiêu chíTrọng sốMô tả
Technical Accuracy25%SQL đúng, data hợp lý, không lỗi logic
Analytical Depth25%Phân tích sâu, không chỉ mô tả bề mặt
Business Relevance25%Insights actionable, relevant cho stakeholders
Presentation15%Dashboard đẹp, report chuyên nghiệp
Recommendations10%Cụ thể, khả thi, có quantified impact
task-6

Tổng kết khóa học

TB5 min

🎓 Chúc mừng hoàn thành khóa học!

Bạn đã học được:

ModuleKỹ năngỨng dụng thực tế
Module 1E-commerce fundamentals, KPI frameworkHiểu business model & đo lường
Module 2Funnel, RFM, Cohort analysisCustomer segmentation & retention
Module 3Product, Seller, Marketing analyticsProduct optimization & growth
Module 4Review analysis, Dashboard designQuality monitoring & reporting

🔗 Bước tiếp theo:

Career Tip

E-commerce Data Analyst là một trong những vị trí hot nhất tại Việt Nam hiện nay. Với portfolio gồm capstone project + SQL queries + dashboard, bạn đã có nền tảng vững chắc để ứng tuyển tại Shopee, Lazada, Tiki, Sendo và nhiều startup e-commerce khác!