🛡️ Error Handling
Xây dựng AI agents robust với error handling và fallbacks.
Why Error Handling Matters
Common Failure Points
Text
1AI failures:2- Doesn't understand intent3- Hallucinates information4- Gives wrong answer5- Response too long/short6- Inappropriate content7 8System failures:9- API timeout10- Service unavailable11- Rate limits12- Network issues13 14User-related:15- Unclear input16- Off-topic requests17- Abuse attempts18- Language barriersConversation Error Handling
Intent Not Understood
Text
1Detection:2- Low confidence score3- No matching intent4- Multiple possible intents5 6Response:7"I'm not sure I understood that correctly.8Could you rephrase your question?9 10Or choose from these options:11• Track an order12• Get support13• Ask a question"Clarification Flow
Text
1User: "I need help with the thing"2Agent: "I'd love to help! Could you tell me more?3 Are you looking for help with:4 • An order5 • A product6 • Account settings7 • Something else"8User: "Order"9Agent: [Now understands, proceeds to order flow]Multiple Failures
Text
1Track consecutive failures:2 3If failure_count = 1:4 "Let me try to understand better..."5 6If failure_count = 2:7 "I'm having trouble understanding.8 Here are things I can help with: [options]"9 10If failure_count >= 3:11 "I think you'd be better helped by a human.12 Let me connect you to our team."13 → Human handoffAI Response Validation
Check Before Showing
Text
1Validate AI response:21. Not empty32. Reasonable length43. No forbidden content54. Relevant to question65. Factually checkable (if possible)Confidence Scoring
Text
1If AI confidence < 0.7:2 Add disclaimer:3 "Based on my understanding, [response].4 If this doesn't answer your question,5 please let me know."6 7If AI confidence < 0.5:8 Don't show response9 Ask for clarification insteadHallucination Prevention
Text
1Strategies:21. Ground responses in knowledge base32. Add "I don't know" training43. Request citations54. Verify against data65. Human review for critical infoFallback Strategies
Tiered Fallbacks
Text
1Level 1: Rephrase request2"Could you say that differently?"3 4Level 2: Offer alternatives5"I can help with X, Y, or Z. Which interests you?"6 7Level 3: Provide resources8"I found these resources that might help: [links]"9 10Level 4: Human handoff11"Let me connect you with someone who can help."Graceful Degradation
Text
1If knowledge base unavailable:2 → Use general LLM knowledge3 → Add disclaimer4 5If AI service down:6 → Provide FAQ links7 → Offer callback8 9If all else fails:10 → Apologize sincerely11 → Collect contact info12 → Promise follow-upHuman Handoff
When to Handoff
Text
1Automatic triggers:2- User requests human3- Multiple failures4- High frustration detected5- Sensitive topics6- Complex issues7- Payment/billing issues8- ComplaintsHandoff Flow
Text
1Agent: "I think you'd be best helped by one of our team members.2 Let me connect you now.3 4 Before I do, can I get:5 • Your name?6 • Best contact method?7 • Brief summary of your issue?"8 9[Collect info]10 11Agent: "Thanks! A team member will be with you shortly.12 For reference, your case number is #12345.13 Current wait time is approximately 5 minutes."14 15[Transfer to human with full context]Context Transfer
Text
1Pass to human agent:2- Full conversation transcript3- User info (if known)4- Intent detected5- Actions attempted6- Suggested resolution7- Sentiment analysisSeamless Integration
Text
1Integrations:2- Zendesk3- Intercom4- Freshdesk5- LiveChat6- Custom systems7 8Workflow:91. Create ticket/conversation102. Attach transcript113. Assign to agent/queue124. Notify team135. Track resolutionSystem Error Handling
API Failures
Text
1HTTP errors:2- 400: Bad request → Check input3- 401: Unauthorized → Refresh token4- 403: Forbidden → Check permissions5- 404: Not found → Handle gracefully6- 429: Rate limit → Wait and retry7- 500: Server error → Fallback/retry8- 503: Unavailable → Try laterRetry Strategy
Text
1Exponential backoff:2 3Attempt 1: Immediate4Attempt 2: Wait 1 second5Attempt 3: Wait 2 seconds6Attempt 4: Wait 4 seconds7Max attempts: 3-58 9Code pattern:10for attempt in 1..max_attempts:11 try:12 result = make_api_call()13 return result14 except:15 wait(2^attempt seconds)16give_up()Circuit Breaker
Text
1If service failing consistently:21. Count failures32. If failures > threshold in time window:4 - "Open" circuit5 - Stop calling failing service6 - Use fallback73. After cool-down:8 - Try one request9 - If success: close circuit10 - If fail: stay openError Messages
User-Friendly Messages
Good Error Messages
Text
1❌ Bad:2"Error 500: Internal server error"3"API timeout exception"4"NullPointerException in OrderService"5 6✅ Good:7"I'm having trouble accessing that information right now.8Could you try again in a few minutes?"9 10"Our order system is being updated.11Your order is safe! Check back in an hour."12 13"Something unexpected happened.14Don't worry - I've noted this and someone will help soon."Contextual Messages
Text
1Order lookup error:2"I couldn't find that order number.3Could you double-check it?4Order numbers are usually 5 digits, like 12345."5 6Calendar booking error:7"That time slot just became unavailable.8How about [alternative time]?"9 10Payment error:11"There was an issue processing your request.12Please don't worry - no charges were made.13Would you like to try again or speak with someone?"Logging & Monitoring
What to Log
Text
1Log every interaction:2- Timestamp3- User ID4- Input received5- Intent detected6- Confidence score7- Actions taken8- Response given9- Errors (if any)10- DurationError Tracking
Text
1Track:2- Error type3- Frequency4- User impact5- Resolution status6 7Tools:8- Internal dashboard9- Sentry10- LogRocket11- Custom analyticsAlerting
Text
1Alert on:2- Error rate spikes3- Response time increases4- API failures5- Low confidence trends6- High handoff rateTesting Error Scenarios
Test Cases
Test These Scenarios
Text
1☐ Empty input2☐ Very long input3☐ Special characters4☐ Unknown intent5☐ Multiple intents6☐ Wrong entity format7☐ API timeout (mock)8☐ API error (mock)9☐ Rate limit hit10☐ Offensive content11☐ Off-topic request12☐ Request for human13☐ Multiple consecutive errors14☐ Recovery from errorChaos Testing
Text
1Intentionally break things:21. Disable API connections32. Inject delays43. Return error responses54. Remove knowledge base65. See how agent handlesRecovery Patterns
State Recovery
Text
1If conversation state lost:2- Apologize for confusion3- Summarize what was understood4- Ask user to confirm/correct5- Continue from last known good stateGraceful Restart
Text
1After major error:2"I apologize - I hit a snag!3Let me start fresh.4What can I help you with today?"5 6[Reset state but keep user info]Bài Tập
Practice
Implement Error Handling:
- Add fallback for unrecognized intents
- Implement retry logic for APIs
- Build human handoff flow
- Create user-friendly error messages
- Set up error logging
- Test all failure scenarios
Tiếp theo: Bài 12 - Testing & Optimization
