Cost Optimization

🎯 Mục tiêu bài học

TB5 min

AI API costs có thể tăng nhanh chóng. Bài này cover các strategies để giảm chi phí mà vẫn giữ chất lượng.

Sau bài này, bạn sẽ:

✅ Hiểu cost breakdown của AI applications ✅ Implement model tiering strategy ✅ Xây dựng caching layer với Redis ✅ Tối ưu prompts và sử dụng batching ✅ Setup cost monitoring và budget alerts

Task 0

🔍 Cost Breakdown

TB5 min

Diagram

Đang vẽ diagram...

Checkpoint

Bạn đã hiểu các thành phần chi phí chính của AI applications chưa?

Task 1

📐 Model Selection Strategy

TB5 min

python.py

1# Tiered model approach
2MODELS = {
3    "simple": {
4        "name": "gpt-4o-mini",
5        "cost_per_1m_input": 0.15,
6        "cost_per_1m_output": 0.60
7    },
8    "complex": {
9        "name": "gpt-4o",
10        "cost_per_1m_input": 2.50,
11        "cost_per_1m_output": 10.00
12    }
13}
14
15def select_model(task_complexity: str):
16    # Route simple tasks to cheaper model
17    simple_tasks = ["classification", "extraction", "translation"]
18    if task_complexity in simple_tasks:
19        return MODELS["simple"]["name"]
20    return MODELS["complex"]["name"]

Checkpoint

Bạn đã hiểu model tiering strategy và khi nào nên dùng model nào chưa?

Task 2

⚡ Caching Strategies

TB5 min

python.py

1import redis
2import hashlib
3import json
4
5r = redis.Redis(host="localhost", port=6379)
6
7class LLMCache:
8    def __init__(self, ttl=3600):
9        self.ttl = ttl
10        self.hits = 0
11        self.misses = 0
12    
13    def _key(self, prompt: str, model: str) -> str:
14        return hashlib.sha256(f"{model}:{prompt}".encode()).hexdigest()
15    
16    def get(self, prompt: str, model: str):
17        key = self._key(prompt, model)
18        cached = r.get(key)
19        if cached:
20            self.hits += 1
21            return json.loads(cached)
22        self.misses += 1
23        return None
24    
25    def set(self, prompt: str, model: str, response: str):
26        key = self._key(prompt, model)
27        r.setex(key, self.ttl, json.dumps(response))
28    
29    @property
30    def hit_rate(self):
31        total = self.hits + self.misses
32        return self.hits / total if total > 0 else 0
33
34cache = LLMCache(ttl=7200)  # 2 hour cache
35
36# Usage in chain
37async def cached_invoke(chain, input_data, model):
38    prompt_key = json.dumps(input_data, sort_keys=True)
39    
40    cached = cache.get(prompt_key, model)
41    if cached:
42        return cached
43    
44    result = await chain.ainvoke(input_data)
45    cache.set(prompt_key, model, result.content)
46    return result.content

Checkpoint

Bạn đã hiểu cách xây dựng caching layer với hit rate tracking chưa?

Task 3

⚡ Prompt Optimization

TB5 min

python.py

1# Before: verbose prompt (many tokens)
2verbose_prompt = """
3You are a professional sentiment analyzer. Your job is to analyze 
4the sentiment of the given text. You should determine whether the 
5sentiment is positive, negative, or neutral. Please provide your 
6analysis in a structured format with the sentiment label and a 
7confidence score between 0 and 1. Also include a brief reasoning 
8for your classification.
9"""
10
11# After: concise prompt (fewer tokens)
12concise_prompt = """Classify sentiment: positive/negative/neutral. 
13Return: label, confidence (0-1), reason (1 sentence)."""
14
15# Savings: ~60% fewer prompt tokens

Checkpoint

Bạn đã hiểu cách tối ưu prompts để giảm token usage chưa?

Task 4

⚡ Batching

TB5 min

python.py

1# Instead of N individual calls...
2# Bad: N API calls
3for text in texts:
4    result = chain.invoke({"text": text})
5
6# Good: 1 batch call
7results = chain.batch(
8    [{"text": t} for t in texts],
9    config={"max_concurrency": 10}
10)
11
12# Better: combine multiple texts in 1 call
13combined = "\n---\n".join([f"[{i}] {t}" for i, t in enumerate(texts)])
14result = chain.invoke({"texts": combined})

Checkpoint

Bạn đã hiểu cách sử dụng batching để giảm API call overhead chưa?

Task 5

💻 Cost Calculator

TB5 min

python.py

1class CostCalculator:
2    PRICING = {
3        "gpt-4o-mini": {"input": 0.15, "output": 0.60},
4        "gpt-4o": {"input": 2.50, "output": 10.00},
5        "text-embedding-3-small": {"input": 0.02},
6        "dall-e-3-standard": {"per_image": 40.0},  # per 1000 images
7        "dall-e-3-hd": {"per_image": 80.0},
8    }
9    
10    @classmethod
11    def estimate(cls, model, input_tokens=0, output_tokens=0, images=0):
12        pricing = cls.PRICING.get(model, {})
13        
14        cost = 0
15        if "input" in pricing:
16            cost += (input_tokens / 1_000_000) * pricing["input"]
17        if "output" in pricing:
18            cost += (output_tokens / 1_000_000) * pricing["output"]
19        if "per_image" in pricing:
20            cost += (images / 1000) * pricing["per_image"]
21        
22        return cost
23    
24    @classmethod
25    def monthly_estimate(cls, daily_requests, avg_input_tokens, avg_output_tokens, model):
26        daily_cost = cls.estimate(model, 
27            daily_requests * avg_input_tokens,
28            daily_requests * avg_output_tokens
29        )
30        return daily_cost * 30
31
32# Estimate
33monthly = CostCalculator.monthly_estimate(
34    daily_requests=1000,
35    avg_input_tokens=500,
36    avg_output_tokens=200,
37    model="gpt-4o-mini"
38)
39print(f"Monthly estimate: ${monthly:.2f}")

Checkpoint

Bạn đã hiểu cách ước tính chi phí AI API hàng tháng chưa?

Task 6

📊 Cost Monitoring Dashboard

TB5 min

python.py

1from datetime import datetime
2
3class CostMonitor:
4    def __init__(self, daily_budget=10.0):
5        self.daily_budget = daily_budget
6        self.daily_costs = {}
7    
8    def record(self, cost, model):
9        today = datetime.now().strftime("%Y-%m-%d")
10        if today not in self.daily_costs:
11            self.daily_costs[today] = {"total": 0, "by_model": {}}
12        
13        self.daily_costs[today]["total"] += cost
14        self.daily_costs[today]["by_model"][model] = \
15            self.daily_costs[today]["by_model"].get(model, 0) + cost
16        
17        # Alert if approaching budget
18        if self.daily_costs[today]["total"] > self.daily_budget * 0.8:
19            print(f"WARNING: 80% of daily budget used!")
20    
21    def report(self):
22        today = datetime.now().strftime("%Y-%m-%d")
23        data = self.daily_costs.get(today, {"total": 0, "by_model": {}})
24        return {
25            "date": today,
26            "total_cost": f"${data['total']:.4f}",
27            "budget_remaining": f"${self.daily_budget - data['total']:.4f}",
28            "by_model": data["by_model"]
29        }

Checkpoint

Bạn đã hiểu cách xây dựng cost monitoring với budget alerts chưa?

Task 7

🎯 Tổng kết

TB5 min

Optimization Checklist

Cost Optimization Checklist

Model tiering: Dùng gpt-4o-mini cho simple tasks (rẻ hơn 17x)
Caching: Cache identical requests (save 30-50%)
Prompt optimization: Giảm prompt length (save 20-40%)
Batching: Combine requests (reduce overhead)
Max tokens: Set max_tokens phù hợp (tránh waste)
Budget alerts: Set daily/monthly spending limits
Usage analytics: Track cost per user/feature

Bài tập thực hành

Hands-on Exercise

Implement model tiering cho different task types
Build Redis caching layer với hit rate tracking
Optimize prompts và measure token savings
Setup cost monitoring với budget alerts

Target: Giảm chi phí 50% với cùng chất lượng

Câu hỏi tự kiểm tra

Model tiering là gì và việc sử dụng gpt-4o-mini cho simple tasks giúp tiết kiệm chi phí như thế nào so với gpt-4o?
Caching có thể giảm 30-50% chi phí AI như thế nào? Giải thích cách tính cache hit rate.
Prompt optimization giúp giảm token usage bằng những kỹ thuật nào (rút gọn, batching, max_tokens)?
Budget tracking và cost alerts cần được implement như thế nào để kiểm soát chi phí hàng ngày/hàng tháng?

🎉 Tuyệt vời! Bạn đã hoàn thành bài học Cost Optimization!

Tiếp theo: Chúng ta sẽ thực hiện Capstone Project để tổng hợp tất cả kiến thức đã học!

Task 8

🚀 Bài tiếp theo

Capstone Project - Full Deployment →

🎯 Mục tiêu bài học

Sau bài này, bạn sẽ:

🔍 Cost Breakdown

Checkpoint

📐 Model Selection Strategy

Checkpoint

⚡ Caching Strategies

Checkpoint

⚡ Prompt Optimization

Checkpoint

⚡ Batching

Checkpoint

💻 Cost Calculator

Checkpoint

📊 Cost Monitoring Dashboard

Checkpoint

🎯 Tổng kết

Optimization Checklist

Bài tập thực hành

Câu hỏi tự kiểm tra

🚀 Bài tiếp theo

Khóa học

Mentor & Hỗ trợ

Blog

Giới thiệu