GenAI Image & Multimodal AI
Tạo và phân tích hình ảnh với DALL-E, Stable Diffusion, GPT-4V - Xây dựng Multimodal AI Apps.
📋 Mô tả khóa học
AI không chỉ là text - khóa học này giúp bạn generate images, analyze visuals, và build multimodal applications kết hợp text + image + audio. Từ DALL-E đến GPT-4V và Stable Diffusion.
🎯 Bạn sẽ học được gì?
- ✅ Image generation (DALL-E 3, Midjourney, SD)
- ✅ Image editing & inpainting
- ✅ Vision models (GPT-4V, Claude 3 Vision)
- ✅ Multimodal applications architecture
- ✅ ComfyUI & ControlNet
- ✅ Image-to-text & text-to-image pipelines
👥 Khóa học dành cho ai?
- Developers expanding to visual AI
- Product teams building image features
- Creative technologists
- Anyone building multimodal applications
📚 Chương trình học (14 bài)
Module 1: Image Generation
- Image Gen Fundamentals - How diffusion models work
- DALL-E 3 - API usage, prompt engineering for images
- Stable Diffusion - Local setup, models, samplers
- Advanced Prompting - Style, composition, consistency
Module 2: Image Editing & Control
- Image Editing - Inpainting, outpainting, variations
- ComfyUI - Node-based workflows
- ControlNet - Pose, depth, edge control
- Style Transfer - Consistent characters, brand styles
Module 3: Vision & Understanding
- Vision Models - GPT-4V, Claude 3 Vision
- Image Analysis - OCR, object detection, classification
- Visual QA - Answer questions about images
- Document Vision - Extract data from documents/receipts
Module 4: Multimodal Applications
- Multimodal Pipelines - Combine text, image, audio
- Capstone Project - Complete multimodal app
🛠️ Tech Stack
- Python 3.10+
- OpenAI API (DALL-E, GPT-4V)
- Anthropic Claude 3 Vision
- Stable Diffusion, ComfyUI
- Replicate API
🚀 Dự án chính
- AI Art Generator - Custom style image creation
- Visual QA System - Upload image, ask questions
- Product Image Analyzer - E-commerce image processing
- Video Thumbnail Generator - Auto-generate thumbnails
⚙️ Prerequisites
- ✅ RAG & Vector Databases course
- ✅ Python intermediate level
- ✅ GPU recommended (for local SD)
Thời lượng: 6-8 tuần (5-7 giờ/tuần)
Level: Intermediate
Pathway: GenAI Coding
