Capstone Project - Image Processing Platform

🎯 Mục tiêu bài học

TB5 min

Xây dựng Image Processing Platform kết hợp tất cả kỹ thuật đã học: generation, editing, analysis, và multimodal pipelines.

Sau bài này, bạn sẽ:

✅ Xây dựng Image Processing Platform hoàn chỉnh ✅ Thiết kế RESTful API cho image generation, editing, analysis ✅ Implement testing strategy cho multimodal applications ✅ Tổng kết toàn bộ kiến thức Image & Multimodal AI

Task 0

🔍 Project Overview

TB5 min

Diagram

Đang vẽ diagram...

Project Structure

Ví dụ

1image-platform/
2  app/
3    main.py
4    generators.py
5    editors.py
6    analyzers.py
7    models.py
8  tests/
9  static/
10    uploads/
11    outputs/
12  requirements.txt

Checkpoint

Bạn đã hiểu kiến trúc tổng quan và cấu trúc project chưa?

Task 1

📐 Models

TB5 min

python.py

1# app/models.py
2from pydantic import BaseModel, Field
3from typing import List, Optional, Literal
4
5class GenerateRequest(BaseModel):
6    prompt: str
7    style: Literal["photo", "illustration", "art", "3d"] = "photo"
8    size: Literal["1024x1024", "1792x1024", "1024x1792"] = "1024x1024"
9    provider: Literal["dalle", "sd"] = "dalle"
10
11class AnalyzeRequest(BaseModel):
12    image_url: str
13    tasks: List[Literal["describe", "classify", "tag", "ocr", "quality"]]
14
15class AnalyzeResponse(BaseModel):
16    description: Optional[str] = None
17    classification: Optional[dict] = None
18    tags: Optional[List[str]] = None
19    text_content: Optional[str] = None
20    quality_score: Optional[int] = None
21
22class EditRequest(BaseModel):
23    image_url: str
24    operation: Literal["remove_bg", "upscale", "style_transfer"]
25    style_reference: Optional[str] = None

Checkpoint

Bạn đã thiết kế Pydantic models cho API requests và responses chưa?

Task 2

🎨 Generators

TB5 min

python.py

1# app/generators.py
2from openai import OpenAI
3
4client = OpenAI()
5
6style_map = {
7    "photo": "professional photography, realistic, high quality, 8K",
8    "illustration": "digital illustration, clean lines, vibrant colors",
9    "art": "fine art painting, artistic, expressive brushstrokes",
10    "3d": "3D render, octane render, volumetric lighting, detailed"
11}
12
13async def generate_dalle(prompt: str, style: str, size: str):
14    full_prompt = f"{prompt}, {style_map.get(style, '')}"
15    
16    response = client.images.generate(
17        model="dall-e-3",
18        prompt=full_prompt,
19        size=size,
20        quality="hd"
21    )
22    
23    return {
24        "url": response.data[0].url,
25        "revised_prompt": response.data[0].revised_prompt
26    }

Checkpoint

Bạn đã implement image generator module với style mapping chưa?

Task 3

🔍 Analyzers

TB5 min

python.py

1# app/analyzers.py
2from langchain_openai import ChatOpenAI
3from langchain_core.messages import HumanMessage
4
5llm = ChatOpenAI(model="gpt-4o")
6
7async def analyze_image(image_url: str, tasks: list):
8    results = {}
9    
10    for task in tasks:
11        prompts = {
12            "describe": "Mo ta chi tiet hinh anh nay.",
13            "classify": "Classify hinh anh nay vao category phu hop.",
14            "tag": "Generate 10 tags cho hinh anh nay.",
15            "ocr": "Extract tat ca text trong hinh anh.",
16            "quality": "Danh gia chat luong anh 1-10 va giai thich."
17        }
18        
19        response = await llm.ainvoke([
20            HumanMessage(content=[
21                {"type": "text", "text": prompts[task]},
22                {"type": "image_url", "image_url": {"url": image_url}}
23            ])
24        ])
25        results[task] = response.content
26    
27    return results

Checkpoint

Bạn đã implement analyzer module với multi-task support chưa?

Task 4

💻 FastAPI Backend

TB5 min

python.py

1# app/main.py
2from fastapi import FastAPI, UploadFile, File
3from fastapi.staticfiles import StaticFiles
4from app.models import *
5from app.generators import generate_dalle
6from app.analyzers import analyze_image
7
8app = FastAPI(title="Image Processing Platform")
9app.mount("/static", StaticFiles(directory="static"), name="static")
10
11@app.post("/generate")
12async def generate(request: GenerateRequest):
13    result = await generate_dalle(request.prompt, request.style, request.size)
14    return result
15
16@app.post("/analyze")
17async def analyze(request: AnalyzeRequest):
18    results = await analyze_image(request.image_url, request.tasks)
19    return results
20
21@app.post("/upload")
22async def upload(file: UploadFile = File(...)):
23    content = await file.read()
24    path = f"static/uploads/{file.filename}"
25    with open(path, "wb") as f:
26        f.write(content)
27    return {"url": f"/static/uploads/{file.filename}"}

Checkpoint

Bạn đã thiết kế và implement các API endpoints cho platform chưa?

Task 5

🧪 Testing

TB5 min

python.py

1# tests/test_platform.py
2import pytest
3from httpx import AsyncClient
4from app.main import app
5
6@pytest.mark.asyncio
7async def test_generate():
8    async with AsyncClient(app=app, base_url="http://test") as ac:
9        response = await ac.post("/generate", json={
10            "prompt": "A beautiful sunset",
11            "style": "photo",
12            "size": "1024x1024"
13        })
14        assert response.status_code == 200
15        assert "url" in response.json()

Rubric đánh giá

Scoring Rubric

Criteria	Points
Image Generation (DALL-E)	20
Image Analysis (Vision)	20
Image Editing	15
Multimodal Pipeline	15
API Design	10
Error Handling	10
Testing	5
Code Quality	5
Total	100

Checkpoint

Bạn đã viết test cases và hiểu rubric đánh giá cho capstone project chưa?

Task 6

🎯 Tổng kết

TB5 min

Key Takeaways

Những gì đã học

Image Generation: DALL-E 3, Stable Diffusion, ComfyUI
Prompt Engineering: Style, composition, quality keywords
Image Editing: Inpainting, outpainting, background removal
ControlNet: Canny, depth, pose control
Vision Models: GPT-4V, Claude Vision, structured extraction
Multimodal: Kết hợp text + image pipelines
Production: API design, batch processing, cost management

Câu hỏi tự kiểm tra

Các module chính của Image Processing Platform gồm những gì và chúng tương tác với nhau như thế nào?
Làm sao thiết kế RESTful API cho image generation, editing và analysis trong một platform thống nhất?
Testing strategy cho multimodal applications (image + text) cần chú ý những điểm đặc biệt gì?
Làm sao quản lý chi phí và tối ưu hiệu suất khi deploy image processing platform lên production?

🎉 Tuyệt vời! Bạn đã hoàn thành toàn bộ khóa học Image & Multimodal AI!

Bạn đã thành thạo: Image Generation (DALL-E 3, Stable Diffusion), Prompt Engineering, Image Editing, ControlNet, Vision Models, Image Analysis, Visual QA, và Multimodal Pipelines. Hãy áp dụng những kỹ năng này vào các dự án thực tế!

Task 7

Chúc mừng bạn đã hoàn thành khóa học Image Processing với AI! 🎉

Tiếp theo: GenAI Deployment →

Capstone Project - Image Processing Platform

🎯 Mục tiêu bài học

Sau bài này, bạn sẽ:

🔍 Project Overview

Project Structure

Checkpoint

📐 Models

Checkpoint

🎨 Generators

Checkpoint

🔍 Analyzers

Checkpoint

💻 FastAPI Backend

Checkpoint

🧪 Testing

Rubric đánh giá

Checkpoint

🎯 Tổng kết

Key Takeaways

Câu hỏi tự kiểm tra

Khóa học

Mentor & Hỗ trợ

Blog

Giới thiệu