Lý thuyết
40 phút
Bài 2/3

FastAPI for AI Services

Xây dựng AI APIs với FastAPI - từ basics đến production

⚡ FastAPI for AI Services

FastAPI là framework Python tốt nhất cho building AI APIs - async support, automatic docs, và validation.

Tại sao FastAPI?

FastAPI Advantages
  • Async/await - Handle nhiều concurrent requests
  • Type hints - Automatic validation
  • Auto docs - Swagger UI built-in
  • Fast - Performance ngang Node.js
  • Modern - Python 3.7+

Project Setup

Bash
1# Create project
2mkdir ai-api && cd ai-api
3
4# Virtual environment
5python -m venv venv
6source venv/bin/activate # Linux/Mac
7.\venv\Scripts\activate # Windows
8
9# Install dependencies
10pip install fastapi uvicorn openai pydantic python-dotenv

Project Structure

Text
1ai-api/
2├── app/
3│ ├── __init__.py
4│ ├── main.py
5│ ├── config.py
6│ ├── routers/
7│ │ ├── __init__.py
8│ │ ├── chat.py
9│ │ └── images.py
10│ ├── services/
11│ │ ├── __init__.py
12│ │ ├── llm.py
13│ │ └── cache.py
14│ └── models/
15│ ├── __init__.py
16│ └── schemas.py
17├── requirements.txt
18├── .env
19└── Dockerfile

Basic API

main.py

Python
1from fastapi import FastAPI
2from fastapi.middleware.cors import CORSMiddleware
3from app.routers import chat, images
4
5app = FastAPI(
6 title="AI API",
7 description="Production-ready AI services",
8 version="1.0.0"
9)
10
11# CORS
12app.add_middleware(
13 CORSMiddleware,
14 allow_origins=["*"],
15 allow_credentials=True,
16 allow_methods=["*"],
17 allow_headers=["*"],
18)
19
20# Routers
21app.include_router(chat.router, prefix="/api/chat", tags=["Chat"])
22app.include_router(images.router, prefix="/api/images", tags=["Images"])
23
24@app.get("/health")
25async def health_check():
26 return {"status": "healthy"}

schemas.py (Pydantic Models)

Python
1from pydantic import BaseModel, Field
2from typing import List, Optional
3from enum import Enum
4
5class MessageRole(str, Enum):
6 system = "system"
7 user = "user"
8 assistant = "assistant"
9
10class Message(BaseModel):
11 role: MessageRole
12 content: str
13
14class ChatRequest(BaseModel):
15 messages: List[Message]
16 model: str = "gpt-4o-mini"
17 temperature: float = Field(default=0.7, ge=0, le=2)
18 max_tokens: Optional[int] = Field(default=None, ge=1, le=4096)
19 stream: bool = False
20
21class ChatResponse(BaseModel):
22 message: str
23 model: str
24 usage: dict
25
26class ImageRequest(BaseModel):
27 prompt: str = Field(..., min_length=1, max_length=4000)
28 size: str = "1024x1024"
29 quality: str = "standard"
30 n: int = Field(default=1, ge=1, le=4)
31
32class ImageResponse(BaseModel):
33 images: List[str]
34 revised_prompt: Optional[str]

Chat Endpoint

routers/chat.py

Python
1from fastapi import APIRouter, HTTPException
2from fastapi.responses import StreamingResponse
3from app.models.schemas import ChatRequest, ChatResponse
4from app.services.llm import LLMService
5
6router = APIRouter()
7llm_service = LLMService()
8
9@router.post("/completions", response_model=ChatResponse)
10async def create_chat_completion(request: ChatRequest):
11 """Create chat completion"""
12 try:
13 if request.stream:
14 return StreamingResponse(
15 llm_service.stream_chat(request),
16 media_type="text/event-stream"
17 )
18
19 response = await llm_service.chat(request)
20 return response
21
22 except Exception as e:
23 raise HTTPException(status_code=500, detail=str(e))
24
25@router.post("/stream")
26async def stream_chat(request: ChatRequest):
27 """Stream chat completion"""
28 return StreamingResponse(
29 llm_service.stream_chat(request),
30 media_type="text/event-stream"
31 )

services/llm.py

Python
1from openai import AsyncOpenAI
2from app.models.schemas import ChatRequest, ChatResponse
3import json
4
5class LLMService:
6 def __init__(self):
7 self.client = AsyncOpenAI()
8
9 async def chat(self, request: ChatRequest) -> ChatResponse:
10 """Non-streaming chat"""
11 messages = [{"role": m.role, "content": m.content} for m in request.messages]
12
13 response = await self.client.chat.completions.create(
14 model=request.model,
15 messages=messages,
16 temperature=request.temperature,
17 max_tokens=request.max_tokens
18 )
19
20 return ChatResponse(
21 message=response.choices[0].message.content,
22 model=response.model,
23 usage={
24 "prompt_tokens": response.usage.prompt_tokens,
25 "completion_tokens": response.usage.completion_tokens,
26 "total_tokens": response.usage.total_tokens
27 }
28 )
29
30 async def stream_chat(self, request: ChatRequest):
31 """Streaming chat"""
32 messages = [{"role": m.role, "content": m.content} for m in request.messages]
33
34 stream = await self.client.chat.completions.create(
35 model=request.model,
36 messages=messages,
37 temperature=request.temperature,
38 stream=True
39 )
40
41 async for chunk in stream:
42 if chunk.choices[0].delta.content:
43 data = {"content": chunk.choices[0].delta.content}
44 yield f"data: {json.dumps(data)}\n\n"
45
46 yield "data: [DONE]\n\n"

Image Endpoint

routers/images.py

Python
1from fastapi import APIRouter, HTTPException
2from app.models.schemas import ImageRequest, ImageResponse
3from openai import AsyncOpenAI
4
5router = APIRouter()
6client = AsyncOpenAI()
7
8@router.post("/generate", response_model=ImageResponse)
9async def generate_image(request: ImageRequest):
10 """Generate image with DALL-E"""
11 try:
12 response = await client.images.generate(
13 model="dall-e-3",
14 prompt=request.prompt,
15 size=request.size,
16 quality=request.quality,
17 n=1 # DALL-E 3 only supports n=1
18 )
19
20 return ImageResponse(
21 images=[response.data[0].url],
22 revised_prompt=response.data[0].revised_prompt
23 )
24
25 except Exception as e:
26 raise HTTPException(status_code=500, detail=str(e))

Authentication

API Key Auth

Python
1from fastapi import Security, HTTPException
2from fastapi.security import APIKeyHeader
3import os
4
5api_key_header = APIKeyHeader(name="X-API-Key")
6
7async def verify_api_key(api_key: str = Security(api_key_header)):
8 valid_keys = os.getenv("API_KEYS", "").split(",")
9 if api_key not in valid_keys:
10 raise HTTPException(status_code=403, detail="Invalid API key")
11 return api_key
12
13# Usage in router
14@router.post("/chat")
15async def chat(
16 request: ChatRequest,
17 api_key: str = Security(verify_api_key)
18):
19 # Authenticated endpoint
20 pass

JWT Auth

Python
1from fastapi import Depends
2from fastapi.security import OAuth2PasswordBearer
3from jose import JWTError, jwt
4
5oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
6
7async def get_current_user(token: str = Depends(oauth2_scheme)):
8 try:
9 payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
10 user_id = payload.get("sub")
11 if user_id is None:
12 raise HTTPException(status_code=401)
13 return user_id
14 except JWTError:
15 raise HTTPException(status_code=401)

Error Handling

Python
1from fastapi import Request
2from fastapi.responses import JSONResponse
3
4@app.exception_handler(Exception)
5async def global_exception_handler(request: Request, exc: Exception):
6 return JSONResponse(
7 status_code=500,
8 content={
9 "error": "Internal server error",
10 "detail": str(exc) if DEBUG else "Something went wrong"
11 }
12 )
13
14# Custom exceptions
15class RateLimitExceeded(Exception):
16 pass
17
18@app.exception_handler(RateLimitExceeded)
19async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
20 return JSONResponse(
21 status_code=429,
22 content={"error": "Rate limit exceeded", "retry_after": 60}
23 )

Running the Server

Bash
1# Development
2uvicorn app.main:app --reload --port 8000
3
4# Production
5uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4
6
7# With Gunicorn
8gunicorn app.main:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000

Testing

Python
1# test_api.py
2from fastapi.testclient import TestClient
3from app.main import app
4
5client = TestClient(app)
6
7def test_health():
8 response = client.get("/health")
9 assert response.status_code == 200
10
11def test_chat():
12 response = client.post("/api/chat/completions", json={
13 "messages": [{"role": "user", "content": "Hello"}],
14 "model": "gpt-4o-mini"
15 })
16 assert response.status_code == 200
17 assert "message" in response.json()

Best Practices

FastAPI Tips
  1. Use Pydantic cho validation
  2. Async everywhere cho performance
  3. Dependency injection cho reusability
  4. Proper error handling với custom exceptions
  5. API versioning (/v1/, /v2/)
  6. Rate limiting protect endpoints

Bài tập thực hành

Hands-on Exercise

Build Complete AI API:

  1. Chat endpoint với streaming
  2. Image generation endpoint
  3. Authentication (API key)
  4. Error handling
  5. Basic tests

Target: Production-ready API với docs tại /docs


Tiếp theo

Bài tiếp theo: Docker for AI - Containerize AI applications.