Lý thuyết
35 phút
Bài 11/15

Error Handling

Xử lý lỗi, fallbacks và human handoff trong AI agents

🛡️ Error Handling

Xây dựng AI agents robust với error handling và fallbacks.

Why Error Handling Matters

Common Failure Points

Text
1AI failures:
2- Doesn't understand intent
3- Hallucinates information
4- Gives wrong answer
5- Response too long/short
6- Inappropriate content
7
8System failures:
9- API timeout
10- Service unavailable
11- Rate limits
12- Network issues
13
14User-related:
15- Unclear input
16- Off-topic requests
17- Abuse attempts
18- Language barriers

Conversation Error Handling

Intent Not Understood

Text
1Detection:
2- Low confidence score
3- No matching intent
4- Multiple possible intents
5
6Response:
7"I'm not sure I understood that correctly.
8Could you rephrase your question?
9
10Or choose from these options:
11• Track an order
12• Get support
13• Ask a question"

Clarification Flow

Text
1User: "I need help with the thing"
2Agent: "I'd love to help! Could you tell me more?
3 Are you looking for help with:
4 • An order
5 • A product
6 • Account settings
7 • Something else"
8User: "Order"
9Agent: [Now understands, proceeds to order flow]

Multiple Failures

Text
1Track consecutive failures:
2
3If failure_count = 1:
4 "Let me try to understand better..."
5
6If failure_count = 2:
7 "I'm having trouble understanding.
8 Here are things I can help with: [options]"
9
10If failure_count >= 3:
11 "I think you'd be better helped by a human.
12 Let me connect you to our team."
13 → Human handoff

AI Response Validation

Check Before Showing

Text
1Validate AI response:
21. Not empty
32. Reasonable length
43. No forbidden content
54. Relevant to question
65. Factually checkable (if possible)

Confidence Scoring

Text
1If AI confidence < 0.7:
2 Add disclaimer:
3 "Based on my understanding, [response].
4 If this doesn't answer your question,
5 please let me know."
6
7If AI confidence < 0.5:
8 Don't show response
9 Ask for clarification instead

Hallucination Prevention

Text
1Strategies:
21. Ground responses in knowledge base
32. Add "I don't know" training
43. Request citations
54. Verify against data
65. Human review for critical info

Fallback Strategies

Tiered Fallbacks

Text
1Level 1: Rephrase request
2"Could you say that differently?"
3
4Level 2: Offer alternatives
5"I can help with X, Y, or Z. Which interests you?"
6
7Level 3: Provide resources
8"I found these resources that might help: [links]"
9
10Level 4: Human handoff
11"Let me connect you with someone who can help."

Graceful Degradation

Text
1If knowledge base unavailable:
2 → Use general LLM knowledge
3 → Add disclaimer
4
5If AI service down:
6 → Provide FAQ links
7 → Offer callback
8
9If all else fails:
10 → Apologize sincerely
11 → Collect contact info
12 → Promise follow-up

Human Handoff

When to Handoff

Text
1Automatic triggers:
2- User requests human
3- Multiple failures
4- High frustration detected
5- Sensitive topics
6- Complex issues
7- Payment/billing issues
8- Complaints

Handoff Flow

Text
1Agent: "I think you'd be best helped by one of our team members.
2 Let me connect you now.
3
4 Before I do, can I get:
5 • Your name?
6 • Best contact method?
7 • Brief summary of your issue?"
8
9[Collect info]
10
11Agent: "Thanks! A team member will be with you shortly.
12 For reference, your case number is #12345.
13 Current wait time is approximately 5 minutes."
14
15[Transfer to human with full context]

Context Transfer

Text
1Pass to human agent:
2- Full conversation transcript
3- User info (if known)
4- Intent detected
5- Actions attempted
6- Suggested resolution
7- Sentiment analysis

Seamless Integration

Text
1Integrations:
2- Zendesk
3- Intercom
4- Freshdesk
5- LiveChat
6- Custom systems
7
8Workflow:
91. Create ticket/conversation
102. Attach transcript
113. Assign to agent/queue
124. Notify team
135. Track resolution

System Error Handling

API Failures

Text
1HTTP errors:
2- 400: Bad request → Check input
3- 401: Unauthorized → Refresh token
4- 403: Forbidden → Check permissions
5- 404: Not found → Handle gracefully
6- 429: Rate limit → Wait and retry
7- 500: Server error → Fallback/retry
8- 503: Unavailable → Try later

Retry Strategy

Text
1Exponential backoff:
2
3Attempt 1: Immediate
4Attempt 2: Wait 1 second
5Attempt 3: Wait 2 seconds
6Attempt 4: Wait 4 seconds
7Max attempts: 3-5
8
9Code pattern:
10for attempt in 1..max_attempts:
11 try:
12 result = make_api_call()
13 return result
14 except:
15 wait(2^attempt seconds)
16give_up()

Circuit Breaker

Text
1If service failing consistently:
21. Count failures
32. If failures > threshold in time window:
4 - "Open" circuit
5 - Stop calling failing service
6 - Use fallback
73. After cool-down:
8 - Try one request
9 - If success: close circuit
10 - If fail: stay open

Error Messages

User-Friendly Messages

Good Error Messages
Text
1❌ Bad:
2"Error 500: Internal server error"
3"API timeout exception"
4"NullPointerException in OrderService"
5
6✅ Good:
7"I'm having trouble accessing that information right now.
8Could you try again in a few minutes?"
9
10"Our order system is being updated.
11Your order is safe! Check back in an hour."
12
13"Something unexpected happened.
14Don't worry - I've noted this and someone will help soon."

Contextual Messages

Text
1Order lookup error:
2"I couldn't find that order number.
3Could you double-check it?
4Order numbers are usually 5 digits, like 12345."
5
6Calendar booking error:
7"That time slot just became unavailable.
8How about [alternative time]?"
9
10Payment error:
11"There was an issue processing your request.
12Please don't worry - no charges were made.
13Would you like to try again or speak with someone?"

Logging & Monitoring

What to Log

Text
1Log every interaction:
2- Timestamp
3- User ID
4- Input received
5- Intent detected
6- Confidence score
7- Actions taken
8- Response given
9- Errors (if any)
10- Duration

Error Tracking

Text
1Track:
2- Error type
3- Frequency
4- User impact
5- Resolution status
6
7Tools:
8- Internal dashboard
9- Sentry
10- LogRocket
11- Custom analytics

Alerting

Text
1Alert on:
2- Error rate spikes
3- Response time increases
4- API failures
5- Low confidence trends
6- High handoff rate

Testing Error Scenarios

Test Cases

Test These Scenarios
Text
1☐ Empty input
2☐ Very long input
3☐ Special characters
4☐ Unknown intent
5☐ Multiple intents
6☐ Wrong entity format
7☐ API timeout (mock)
8☐ API error (mock)
9☐ Rate limit hit
10☐ Offensive content
11☐ Off-topic request
12☐ Request for human
13☐ Multiple consecutive errors
14☐ Recovery from error

Chaos Testing

Text
1Intentionally break things:
21. Disable API connections
32. Inject delays
43. Return error responses
54. Remove knowledge base
65. See how agent handles

Recovery Patterns

State Recovery

Text
1If conversation state lost:
2- Apologize for confusion
3- Summarize what was understood
4- Ask user to confirm/correct
5- Continue from last known good state

Graceful Restart

Text
1After major error:
2"I apologize - I hit a snag!
3Let me start fresh.
4What can I help you with today?"
5
6[Reset state but keep user info]

Bài Tập

Practice

Implement Error Handling:

  1. Add fallback for unrecognized intents
  2. Implement retry logic for APIs
  3. Build human handoff flow
  4. Create user-friendly error messages
  5. Set up error logging
  6. Test all failure scenarios

Tiếp theo: Bài 12 - Testing & Optimization