🎨 Image Generation Fundamentals

Trong bài này, chúng ta sẽ tìm hiểu cách các AI models tạo ra hình ảnh - từ lý thuyết đến thực hành.

Diffusion Models là gì?

Diffusion Process

Diffusion models học cách tạo hình ảnh bằng cách:

Forward process: Thêm noise vào image cho đến khi thành random noise
Reverse process: Học cách remove noise từng bước để tạo image

Diagram

graph LR
    I[Image] --> |Add Noise| N1[Noisy]
    N1 --> |More Noise| N2[Noisier]
    N2 --> |...| N3[Pure Noise]
    
    N3 --> |Denoise| R1[Less Noisy]
    R1 --> |Denoise| R2[Cleaner]
    R2 --> |...| G[Generated Image]

Các loại Image Generation Models

1. DALL-E (OpenAI)

DALL-E 2: 1024x1024, inpainting, variations
DALL-E 3: Better text, more accurate prompts

Python

1from openai import OpenAI
2
3client = OpenAI()
4
5response = client.images.generate(
6    model="dall-e-3",
7    prompt="A serene Vietnamese countryside with rice paddies at sunset, watercolor style",
8    size="1024x1024",
9    quality="hd",
10    n=1
11)
12
13image_url = response.data[0].url
14print(image_url)

2. Stable Diffusion

Open-source, chạy local được:

SD 1.5: Classic model, nhiều fine-tunes
SDXL: Higher quality, 1024x1024
SD 3: Latest, improved text rendering

3. Midjourney

Best cho artistic styles
Chạy qua Discord
Không có API public

Prompt Engineering cho Images

Structure của một good prompt

Text

1[Subject] + [Style] + [Details] + [Lighting] + [Quality Tags]

Ví dụ:

Text

1A Vietnamese woman in traditional áo dài, 
2watercolor painting style, 
3standing in a garden of lotus flowers,
4soft morning light,
5detailed, high quality, 4k

Positive vs Negative Prompts

Positive (muốn có):

"detailed", "high quality", "sharp focus"
"beautiful lighting", "professional photo"

Negative (không muốn):

"blurry", "low quality", "distorted"
"bad anatomy", "extra fingers"

Style Keywords

Style	Keywords
Photo	photorealistic, photography, DSLR, 50mm
Art	oil painting, watercolor, digital art
3D	3D render, Blender, Unreal Engine
Anime	anime style, manga, Studio Ghibli

Thực hành với Python

DALL-E 3 API

Python

1from openai import OpenAI
2import requests
3from PIL import Image
4from io import BytesIO
5
6client = OpenAI()
7
8def generate_image(prompt, size="1024x1024", quality="standard"):
9    """Generate image with DALL-E 3"""
10    response = client.images.generate(
11        model="dall-e-3",
12        prompt=prompt,
13        size=size,
14        quality=quality,
15        n=1
16    )
17    
18    image_url = response.data[0].url
19    revised_prompt = response.data[0].revised_prompt
20    
21    print(f"Revised prompt: {revised_prompt}")
22    return image_url
23
24# Generate
25prompt = "A cozy Vietnamese coffee shop (quán cà phê) with traditional decor, warm lighting, people enjoying cà phê sữa đá"
26url = generate_image(prompt, quality="hd")
27
28# Download image
29response = requests.get(url)
30img = Image.open(BytesIO(response.content))
31img.save("coffee_shop.png")

Replicate API (Stable Diffusion)

Python

1import replicate
2
3# SDXL
4output = replicate.run(
5    "stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b",
6    input={
7        "prompt": "Vietnamese street food scene, pho restaurant, steam rising, warm evening light",
8        "negative_prompt": "blurry, low quality",
9        "width": 1024,
10        "height": 1024,
11        "num_inference_steps": 30,
12        "guidance_scale": 7.5
13    }
14)
15
16print(output)  # Returns URL to generated image

Parameters quan trọng

Guidance Scale (CFG)

Low (1-5): Creative, less adherent to prompt
Medium (7-10): Balanced
High (15+): Strict adherence, có thể over-saturated

Steps

Low (20-30): Fast, less detailed
Medium (40-50): Good balance
High (100+): Slow, marginal improvement

Seed

Cùng seed + cùng prompt = cùng output:

Python

1# Reproducible generation
2output = generate(
3    prompt="...",
4    seed=12345  # Fixed seed
5)

Image Sizes & Aspect Ratios

Use Case	Aspect Ratio	Size
Square (Instagram)	1:1	1024x1024
Portrait	2:3	832x1216
Landscape	3:2	1216x832
Wide	16:9	1344x768

Best Practices

Image Generation Tips

Be specific trong prompts
Start simple, add details gradually
Use negative prompts để avoid artifacts
Experiment với styles và artists
Save seeds cho reproducibility
Iterate - gen nhiều, chọn best

Bài tập thực hành

Hands-on Exercise

Generate Images với DALL-E:

Tạo OpenAI account và get API key
Generate 5 images với các styles khác nhau:
- Photorealistic
- Watercolor
- Digital art
- Anime
- 3D render
Experiment với prompt variations
Compare quality vs cost

Target: Understand relationship giữa prompt và output

Bài tiếp theo: DALL-E 3 Deep Dive - API usage chi tiết, prompt engineering advanced.

Image Generation Fundamentals

🎨 Image Generation Fundamentals

Diffusion Models là gì?

Các loại Image Generation Models

1. DALL-E (OpenAI)

2. Stable Diffusion

3. Midjourney

Prompt Engineering cho Images

Structure của một good prompt

Positive vs Negative Prompts

Style Keywords

Thực hành với Python

DALL-E 3 API

Replicate API (Stable Diffusion)

Parameters quan trọng

Guidance Scale (CFG)

Steps

Seed

Image Sizes & Aspect Ratios

Best Practices

Bài tập thực hành

Tiếp theo

Tài liệu tham khảo