Capstone Project - Text Processing Platform

🎯 Mục tiêu bài học

TB5 min

Trong bài cuối này, bạn sẽ xây dựng một Text Processing Platform hoàn chỉnh, kết hợp tất cả kỹ thuật đã học.

Sau bài này, bạn sẽ:

✅ Xây dựng Text Processing Platform với FastAPI backend ✅ Tích hợp Content Generation, Sentiment, Translation, Classification ✅ Implement batch processing với cost tracking ✅ Viết tests và deploy platform hoàn chỉnh

Task 0

🔍 Project Overview

TB5 min

Diagram

Đang vẽ diagram...

Checkpoint

Bạn đã hiểu kiến trúc tổng thể của Text Processing Platform chưa?

Task 1

📐 Architecture

TB5 min

Project Structure

Ví dụ

1text-processing-platform/
2  app/
3    main.py           # FastAPI app
4    chains.py          # LangChain chains
5    models.py          # Pydantic schemas
6    config.py          # Settings
7  tests/
8    test_chains.py
9  .env
10  requirements.txt

Requirements

Ví dụ

1# requirements.txt
2langchain==0.3.0
3langchain-openai==0.2.0
4fastapi==0.115.0
5uvicorn==0.30.0
6pydantic==2.9.0
7python-dotenv==1.0.1

Checkpoint

Bạn đã setup được project structure và dependencies chưa?

Task 2

💻 Step 1: Define Models

TB5 min

python.py

1# app/models.py
2from pydantic import BaseModel, Field
3from typing import List, Optional, Literal
4
5class ContentRequest(BaseModel):
6    topic: str
7    content_type: Literal["blog", "email", "social", "product"]
8    tone: str = "professional"
9    word_count: int = 500
10
11class ContentResponse(BaseModel):
12    content: str
13    word_count: int
14    tokens_used: int
15
16class AnalysisRequest(BaseModel):
17    text: str
18    tasks: List[Literal["sentiment", "classify", "summarize", "translate"]]
19    target_language: Optional[str] = None
20    categories: Optional[List[str]] = None
21
22class AnalysisResponse(BaseModel):
23    sentiment: Optional[dict] = None
24    classification: Optional[dict] = None
25    summary: Optional[str] = None
26    translation: Optional[str] = None
27
28class BatchRequest(BaseModel):
29    texts: List[str]
30    task: Literal["sentiment", "classify", "summarize", "translate"]

Checkpoint

Bạn đã định nghĩa xong các Pydantic models cho API chưa?

Task 3

💻 Step 2: Build Chains

TB5 min

python.py

1# app/chains.py
2from langchain_openai import ChatOpenAI
3from langchain_core.prompts import ChatPromptTemplate
4from langchain_core.output_parsers import StrOutputParser
5from langchain_core.runnables import RunnableParallel
6
7llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
8llm_precise = ChatOpenAI(model="gpt-4o-mini", temperature=0)
9
10# Content Generation
11def get_content_chain(content_type: str):
12    prompts = {
13        "blog": "Viet blog post chuyen nghiep, {word_count} tu, tone {tone}.",
14        "email": "Viet email {tone}, ngan gon va hieu qua.",
15        "social": "Tao social media post, engaging va shareable.",
16        "product": "Viet product description hap dan, {word_count} tu."
17    }
18    return (
19        ChatPromptTemplate.from_messages([
20            ("system", prompts.get(content_type, prompts["blog"])),
21            ("human", "Topic: {topic}")
22        ])
23        | llm
24        | StrOutputParser()
25    )
26
27# Sentiment Analysis
28sentiment_chain = (
29    ChatPromptTemplate.from_messages([
30        ("system", "Analyze sentiment. Return: sentiment (positive/negative/neutral), confidence (0-1), key phrases."),
31        ("human", "{text}")
32    ])
33    | llm_precise
34)
35
36# Classification
37classify_chain = (
38    ChatPromptTemplate.from_messages([
39        ("system", "Classify text into categories: {categories}. Return category and confidence."),
40        ("human", "{text}")
41    ])
42    | llm_precise
43)
44
45# Summarization
46summarize_chain = (
47    ChatPromptTemplate.from_messages([
48        ("system", "Tom tat text trong 3 cau, giu cac y chinh."),
49        ("human", "{text}")
50    ])
51    | llm_precise
52    | StrOutputParser()
53)
54
55# Translation
56translate_chain = (
57    ChatPromptTemplate.from_messages([
58        ("system", "Dich text sang {target_language}. Chi tra ve ban dich."),
59        ("human", "{text}")
60    ])
61    | llm
62    | StrOutputParser()
63)

Checkpoint

Bạn đã build xong các LangChain chains cho platform chưa?

Task 4

💻 Step 3: FastAPI Backend

TB5 min

python.py

1# app/main.py
2from fastapi import FastAPI, HTTPException
3from app.models import *
4from app.chains import *
5
6app = FastAPI(title="Text Processing Platform")
7
8@app.post("/generate", response_model=ContentResponse)
9async def generate_content(request: ContentRequest):
10    chain = get_content_chain(request.content_type)
11    content = await chain.ainvoke({
12        "topic": request.topic,
13        "tone": request.tone,
14        "word_count": request.word_count
15    })
16    return ContentResponse(
17        content=content,
18        word_count=len(content.split()),
19        tokens_used=0
20    )
21
22@app.post("/analyze", response_model=AnalysisResponse)
23async def analyze_text(request: AnalysisRequest):
24    response = AnalysisResponse()
25    
26    if "sentiment" in request.tasks:
27        result = await sentiment_chain.ainvoke({"text": request.text})
28        response.sentiment = {"result": result.content}
29    
30    if "summarize" in request.tasks:
31        response.summary = await summarize_chain.ainvoke({"text": request.text})
32    
33    if "translate" in request.tasks and request.target_language:
34        response.translation = await translate_chain.ainvoke({
35            "text": request.text,
36            "target_language": request.target_language
37        })
38    
39    return response
40
41@app.post("/batch")
42async def batch_process(request: BatchRequest):
43    chain = {
44        "summarize": summarize_chain,
45        "translate": translate_chain,
46    }.get(request.task)
47    
48    if not chain:
49        raise HTTPException(400, "Unsupported task")
50    
51    results = await chain.abatch(
52        [{"text": t} for t in request.texts],
53        config={"max_concurrency": 10}
54    )
55    return {"results": results}

Checkpoint

Bạn đã implement xong FastAPI backend với các endpoints chưa?

Task 5

🛠️ Step 4: Testing

TB5 min

python.py

1# tests/test_chains.py
2import pytest
3from app.chains import *
4
5@pytest.mark.asyncio
6async def test_content_generation():
7    chain = get_content_chain("blog")
8    result = await chain.ainvoke({
9        "topic": "AI basics",
10        "tone": "professional",
11        "word_count": 200
12    })
13    assert len(result) > 100
14
15@pytest.mark.asyncio
16async def test_summarization():
17    text = "Long article text here with multiple sentences and paragraphs..."
18    result = await summarize_chain.ainvoke({"text": text})
19    assert len(result) < len(text)

Checkpoint

Bạn đã viết tests cho các chains và endpoints chưa?

Task 6

📐 Rubric đánh giá

TB5 min

Scoring Rubric

Criteria	Points
Content Generation (4 types)	20
Sentiment Analysis	15
Text Classification	15
Summarization	15
Translation	10
Batch Processing	10
Error Handling	10
Code Quality	5
Total	100

Checkpoint

Bạn đã review rubric và đảm bảo platform đáp ứng đủ các tiêu chí chưa?

Task 7

🎯 Tổng kết

TB5 min

Extensions (Bonus)

Add streaming responses với SSE
Implement caching với Redis
Add authentication và rate limiting
Build simple web UI với Streamlit
Deploy lên Railway hoặc Render

Những gì đã học

LangChain fundamentals: Models, prompts, chains, parsers
LCEL patterns: Parallel, branching, sequential
Text tasks: Generation, summarization, sentiment, classification, translation
Production patterns: Batch processing, cost tracking, error handling
Integration: FastAPI backend, structured output, testing

Câu hỏi tự kiểm tra

Text Processing Platform cần tích hợp những module chính nào để hoạt động hoàn chỉnh?
Tại sao error handling và retry logic quan trọng trong production text processing?
Batch processing pipeline cần đảm bảo những yêu cầu gì về performance và cost optimization?
Làm thế nào để thiết kế test cases hiệu quả cho một text processing platform hoàn chỉnh?

🎉 Chúc mừng! Bạn đã hoàn thành toàn bộ khóa học Text Processing với AI!

Bạn đã nắm vững LangChain, LCEL patterns, structured output, sentiment analysis, translation, classification, batch processing và xây dựng platform hoàn chỉnh. Hãy tiếp tục hành trình GenAI với khóa học tiếp theo!

Task 8

Tiếp theo: GenAI Image Processing →

Capstone Project - Text Processing Platform

🎯 Mục tiêu bài học

Sau bài này, bạn sẽ:

🔍 Project Overview

Checkpoint

📐 Architecture

Project Structure

Requirements

Checkpoint

💻 Step 1: Define Models

Checkpoint

💻 Step 2: Build Chains

Checkpoint

💻 Step 3: FastAPI Backend

Checkpoint

🛠️ Step 4: Testing

Checkpoint

📐 Rubric đánh giá

Checkpoint

🎯 Tổng kết

Extensions (Bonus)

Câu hỏi tự kiểm tra

Khóa học

Mentor & Hỗ trợ

Blog

Giới thiệu