Large Language Models Hoạt Động Như Thế Nào?

Để sử dụng AI hiệu quả, bạn cần hiểu cách nó hoạt động. Bài học này sẽ giải thích một cách đơn giản nhưng đầy đủ về LLMs.

🎯 Mục tiêu

Hiểu Transformer architecture
Nắm được quy trình từ input → output
Biết các giới hạn của LLMs
Áp dụng hiểu biết vào prompt engineering

1. LLM Là Gì?

Large Language Model (LLM) là mô hình AI được training trên lượng lớn text data để:

Hiểu ngôn ngữ tự nhiên
Sinh văn bản mới có ý nghĩa
Thực hiện nhiều tasks khác nhau

Các LLM phổ biến

Model	Company	Đặc điểm
GPT-4o	OpenAI	Đa năng, code tốt
Claude 3.5	Anthropic	Reasoning sâu, an toàn
Gemini	Google	Multimodal, tích hợp Google
Llama 3	Meta	Open source, customize được

2. Transformer Architecture

2.1. Tổng quan

LLMs dựa trên Transformer - kiến trúc được Google giới thiệu năm 2017.

Text

1Input Text → Tokenization → Embeddings → Transformer Layers → Output

2.2. Tokenization

LLM không đọc text theo từng chữ, mà theo tokens:

Python

1# Ví dụ tokenization
2"Xin chào Vietnam" → ["Xin", " chào", " Vietnam"]
3"Hello world" → ["Hello", " world"]
4
5# Subword tokenization
6"unhappiness" → ["un", "happiness"]

Token limits quan trọng:

GPT-4: ~128K tokens
Claude 3.5: ~200K tokens
Gemini 1.5: ~1M tokens

2.3. Embeddings

Mỗi token được chuyển thành vector số (embedding):

Text

1"cat" → [0.2, -0.5, 0.8, ..., 0.1]  (1536 dimensions với GPT)
2"dog" → [0.3, -0.4, 0.7, ..., 0.2]  (tương tự cat vì cùng nhóm)

Key insight: Words có ý nghĩa tương tự sẽ có vectors gần nhau.

2.4. Attention Mechanism

"Attention is All You Need" - câu nổi tiếng từ paper gốc.

Text

1Query: "The cat sat on the ___"
2 
3Attention scores:
4- "cat" → high attention (subject)
5- "sat" → medium attention (verb)
6- "The" → low attention (article)

LLM "chú ý" vào các từ quan trọng để dự đoán từ tiếp theo.

3. Quy Trình Sinh Text

Step-by-step

Text

11. User: "Viết email xin nghỉ phép"
2   ↓
32. Tokenization: ["Viết", " email", " xin", " nghỉ", " phép"]
4   ↓
53. Embeddings: [vector1, vector2, ...]
6   ↓
74. Transformer Processing (96 layers trong GPT-4)
8   ↓
95. Probability Distribution cho từ tiếp theo:
10   - "Kính" (35%)
11   - "Xin" (28%)
12   - "Chào" (15%)
13   ...
14   ↓
156. Sample/Select: "Kính"
16   ↓
177. Repeat từ step 3 với context mới

Temperature Parameter

Temperature kiểm soát "sự sáng tạo":

Temperature	Behavior	Use case
0.0	Deterministic, luôn chọn xác suất cao nhất	Code, Math
0.7	Balanced, có chút variation	General chat
1.0+	Creative, đa dạng	Brainstorm, Creative writing

Python

1# OpenAI API example
2response = client.chat.completions.create(
3    model="gpt-4",
4    messages=[{"role": "user", "content": "Write a poem"}],
5    temperature=0.9  # Creative mode
6)

4. Những Điều LLM Làm Được và Không Làm Được

✅ LLM làm tốt

Tổng hợp thông tin - Summarize documents
Viết và edit text - Email, reports, code
Dịch thuật - Multi-language translation
Trả lời câu hỏi - Q&A với context
Brainstorm - Idea generation
Code generation - Viết và debug code

❌ LLM có giới hạn

Không có knowledge real-time - Cutoff date
Có thể "hallucinate" - Sinh thông tin sai
Không thực sự "hiểu" - Pattern matching
Math phức tạp - Có thể sai arithmetic
Không nhớ conversations - Stateless by design

⚠️ Hallucination

Hallucination = LLM tự tin nói điều sai sự thật.

Text

1User: "Ai viết cuốn 'The Shadow of the Wind'?"
2Bad LLM: "Gabriel García Márquez"  ❌ (Hallucination)
3Correct: "Carlos Ruiz Zafón"  ✅

Cách giảm hallucination:

Yêu cầu citations
Provide context/documents (RAG)
Ask to say "I don't know" if unsure

5. Ảnh Hưởng Đến Prompt Engineering

Nguyên tắc từ hiểu biết về LLMs:

1. Token limit awareness

Text

1❌ Paste cả document 100 trang
2✅ Extract relevant sections, summarize first

2. Clear context = Better attention

Text

1❌ "Fix this code"
2✅ "Fix the null pointer exception in line 15 of this Python function"

3. Leverage pattern recognition

Text

1✅ Give examples (few-shot learning)
2✅ Use consistent formatting
3✅ Structure with headers and lists

4. Temperature for task type

Text

1Code review → temperature=0
2Creative writing → temperature=0.8

6. Hands-on: Token Counting

Thực hành với tiktoken

Python

1import tiktoken
2
3# Get encoder for GPT-4
4enc = tiktoken.encoding_for_model("gpt-4")
5
6# Count tokens
7text = "Xin chào, tôi là AI assistant"
8tokens = enc.encode(text)
9
10print(f"Text: {text}")
11print(f"Tokens: {tokens}")
12print(f"Token count: {len(tokens)}")
13
14# Decode back
15decoded = enc.decode(tokens)
16print(f"Decoded: {decoded}")

Output:

Text

1Text: Xin chào, tôi là AI assistant
2Tokens: [55, 1980, 11, 87234, ...]
3Token count: 12
4Decoded: Xin chào, tôi là AI assistant

Cost estimation

Python

1def estimate_cost(input_text, output_tokens=500, model="gpt-4"):
2    enc = tiktoken.encoding_for_model(model)
3    input_tokens = len(enc.encode(input_text))
4    
5    # GPT-4 pricing (example)
6    input_cost = input_tokens * 0.03 / 1000
7    output_cost = output_tokens * 0.06 / 1000
8    
9    return {
10        "input_tokens": input_tokens,
11        "estimated_output": output_tokens,
12        "total_cost": f"${input_cost + output_cost:.4f}"
13    }
14
15# Example
16result = estimate_cost("Write a detailed product description...")
17print(result)

📝 Quiz

🎯 Key Takeaways

LLMs là next-word predictors - Dự đoán token tiếp theo dựa trên context
Attention giúp focus vào thông tin quan trọng
Temperature điều chỉnh creativity vs consistency
Token limits quan trọng cho long documents
Hallucination là vấn đề cần lưu ý - luôn verify critical info

🚀 Bài tiếp theo

Prompt Engineering Fundamentals - Học cách viết prompts hiệu quả dựa trên hiểu biết về LLMs!