Document AI Tools
Businesses xử lý hàng ngàn documents mỗi ngày. AI có thể automate extraction, summarization, và analysis - tiết kiệm hàng trăm giờ làm việc.
🎯 Mục tiêu bài học
- Hiểu Document AI capabilities
- Sử dụng các tools extraction phổ biến
- Setup automated document workflows
- Integrate với business processes
📄 Document AI có thể làm gì?
Capabilities Overview
1┌─────────────────────────────────────────────┐2│ DOCUMENT AI CAPABILITIES │3├─────────────────────────────────────────────┤4│ │5│ 📥 EXTRACTION │6│ • Text from images (OCR) │7│ • Data from forms │8│ • Tables from PDFs │9│ • Key-value pairs │10│ │11│ 📝 SUMMARIZATION │12│ • Long documents → Key points │13│ • Meeting notes → Action items │14│ • Reports → Executive summary │15│ │16│ 🔍 ANALYSIS │17│ • Sentiment analysis │18│ • Classification │19│ • Comparison │20│ • Risk identification │21│ │22│ 🔄 TRANSFORMATION │23│ • Format conversion │24│ • Translation │25│ • Standardization │26│ │27└─────────────────────────────────────────────┘ROI của Document AI
| Task | Manual Time | With AI | Savings |
|---|---|---|---|
| Invoice processing | 15 min/doc | 30 sec | 97% |
| Contract review | 2 hours | 10 min | 92% |
| Resume screening | 10 min/CV | 1 min | 90% |
| Report summarization | 30 min | 2 min | 93% |
Real Example: Công ty insurance xử lý 500 claims/ngày. Manual: 10 min/claim = 83 giờ/ngày. AI: 1 min/claim = 8 giờ/ngày. Tiết kiệm 75 giờ/ngày.
🛠️ Popular Document AI Tools
1. ChatGPT / Claude - General Purpose
Best for: Summarization, analysis, Q&A về documents
1Upload PDF/Image → Ask questions2 3Example prompts:4• "Summarize this document in 5 bullet points"5• "What are the key terms and conditions?"6• "List all dates and deadlines mentioned"7• "What risks are identified in this report?"Limitations:
- File size limits (ChatGPT: 512MB, Claude: 10MB per file)
- No structured data export
- Manual process (không auto)
2. Adobe Acrobat AI
Best for: PDF editing, summarization, Q&A
1Features:2✅ Summarize long PDFs3✅ Ask questions about content4✅ Generate citations5✅ Works within Acrobat6 7Pricing: Included with Acrobat Pro ($23/mo)Use case: Legal teams reviewing contracts trong Acrobat workflow.
3. Docsumo - Invoice & Receipt Processing
Best for: Financial documents, invoices, receipts
1Workflow:21. Upload invoice (PDF, image, email)32. AI extracts:4 - Vendor name5 - Invoice number6 - Date7 - Line items8 - Total amount9 - Tax103. Review & approve114. Export to accounting software12 13Integrations: QuickBooks, Xero, SAP, NetSuite14Pricing: From $99/mo (500 pages)4. Nanonets - Forms & Tables
Best for: Structured data extraction from forms
1Document Types:2├── Invoices3├── Purchase Orders4├── Bank Statements5├── ID Cards6├── Medical Forms7└── Custom Forms8 9Features:10• Pre-trained models11• Custom model training12• API access13• Zapier integration5. Parseur - Email & Attachment Parsing
Best for: Auto-extract từ emails và attachments
1Use Cases:2• Parse order confirmation emails3• Extract booking details4• Process lead form submissions5• Handle support ticket emails6 7Workflow:8Email arrives → Parseur extracts → 9Send to Google Sheets/CRM/Database6. Google Document AI
Best for: Enterprise, high-volume processing
1Processors Available:2├── OCR (general text)3├── Form Parser4├── Invoice Parser5├── Receipt Parser6├── ID Document Parser7├── Contract Parser (preview)8└── Custom Document Extractor9 10Pricing: Pay-per-page ($0.001 - $0.10/page)📊 Comparison Table
| Tool | Best For | Ease of Use | Price | Volume |
|---|---|---|---|---|
| ChatGPT/Claude | Ad-hoc analysis | ⭐⭐⭐⭐⭐ | $20/mo | Low |
| Docsumo | Invoices | ⭐⭐⭐⭐ | $99/mo | Medium |
| Nanonets | Forms | ⭐⭐⭐⭐ | $99/mo | Medium |
| Parseur | Emails | ⭐⭐⭐⭐⭐ | $39/mo | Medium |
| Google Doc AI | Enterprise | ⭐⭐⭐ | Pay-per-use | High |
🔧 Hands-on: Document Extraction
Exercise 1: Invoice Data Extraction với Docsumo
Step 1: Setup Account
11. Truy cập docsumo.com22. Sign up free trial33. Chọn "Invoice" templateStep 2: Upload Sample Invoice
11. Dashboard → Upload Documents22. Drag & drop invoice PDF33. Wait for processing (~10 sec)Step 3: Review Extraction
1Extracted Fields:2┌─────────────────────────────────┐3│ Vendor: ABC Company │4│ Invoice #: INV-2024-001 │5│ Date: 15/01/2024 │6│ Due Date: 15/02/2024 │7│ │8│ Line Items: │9│ ├── Product A $100.00 │10│ ├── Product B $250.00 │11│ └── Service C $150.00 │12│ │13│ Subtotal: $500.00 │14│ Tax (10%): $50.00 │15│ Total: $550.00 │16└─────────────────────────────────┘17 18Confidence: 95%Step 4: Export
1Export Options:2• JSON (for API/automation)3• CSV (for spreadsheets)4• Direct to QuickBooks5• Webhook to custom endpointExercise 2: Document Q&A với Claude
Upload Contract và Ask:
1User: [Uploads employment contract PDF]2 "What is the notice period for termination?"3 4Claude: "Based on the contract:5 6 **Notice Period:**7 - Employee: 30 days written notice8 - Employer: 60 days written notice9 - During probation: 7 days for both parties10 11 Reference: Section 8.2, Page 4"More useful prompts:
1📋 Extraction:2"Extract all monetary values mentioned"3"List all parties involved in this agreement"4"What are the key dates and deadlines?"5 6📝 Summarization:7"Summarize in 3 paragraphs for executive review"8"Create bullet points of key obligations"9"What are the main risks in this document?"10 11🔍 Analysis:12"Compare with standard industry terms"13"Identify any unusual clauses"14"What's missing that should be included?"🔄 Automated Document Workflows
Workflow 1: Invoice Processing Pipeline
1┌─────────────┐ ┌─────────────┐ ┌─────────────┐2│ Invoice │ │ Extract │ │ Review │3│ Received │ ──► │ Data │ ──► │ Queue │4│ (Email) │ │ (AI) │ │ │5└─────────────┘ └─────────────┘ └──────┬──────┘6 │7 ┌─────────────┐ │8 │ Accounting │ ◄──────────┘9 │ Software │ Approved10 └─────────────┘Make.com Implementation:
1Module 1: Email trigger (new invoice received)2Module 2: Download attachment3Module 3: Docsumo - Extract data4Module 4: Google Sheets - Log entry5Module 5: IF amount > $1000 → Notify manager6Module 6: QuickBooks - Create billWorkflow 2: Resume Screening
1┌─────────────┐ ┌─────────────┐ ┌─────────────┐2│ Resume │ │ Parse & │ │ Score & │3│ Upload │ ──► │ Extract │ ──► │ Rank │4│ │ │ │ │ │5└─────────────┘ └─────────────┘ └──────┬──────┘6 │7 ┌───────────────────────────────────────┘8 │9 ▼10┌─────────────┐ ┌─────────────┐11│ Qualified │ │ Rejected │12│ → ATS │ │ → Email │13└─────────────┘ └─────────────┘Extracted Fields:
1• Name, Email, Phone2• Education (degree, school, year)3• Experience (company, role, duration)4• Skills (technical, soft)5• Certifications6• LanguagesWorkflow 3: Contract Management
1New Contract Upload2 │3 ▼4┌─────────────────────┐5│ Extract Key Info: │6│ • Parties │7│ • Effective date │8│ • Expiration date │9│ • Value │10│ • Key terms │11└──────────┬──────────┘12 │13 ▼14┌─────────────────────┐15│ Store in Database │16│ Set Reminders: │17│ • 30 days before │18│ • 7 days before │19│ • Renewal date │20└─────────────────────┘📋 Document Types & Best Tools
Financial Documents
| Document Type | Best Tool | Key Extractions |
|---|---|---|
| Invoices | Docsumo, Nanonets | Vendor, amount, line items |
| Receipts | Google Doc AI | Date, merchant, total |
| Bank Statements | Nanonets | Transactions, balances |
| Tax Forms | Docsumo | Income, deductions |
HR Documents
| Document Type | Best Tool | Key Extractions |
|---|---|---|
| Resumes | ChatGPT, Affinda | Skills, experience, education |
| ID Cards | Google Doc AI | Name, DOB, ID number |
| Offer Letters | Claude | Salary, start date, terms |
| Timesheets | Nanonets | Hours, dates, totals |
Legal Documents
| Document Type | Best Tool | Key Extractions |
|---|---|---|
| Contracts | Claude, Adobe AI | Terms, dates, obligations |
| NDAs | ChatGPT | Parties, scope, duration |
| Leases | Google Doc AI | Rent, term, conditions |
| Patents | Claude | Claims, prior art |
⚠️ Accuracy & Verification
Common Extraction Errors
11. OCR Errors (poor scan quality):2 "Amount: $1,OOO" → Should be "$1,000"3 42. Wrong Field Mapping:5 Ship-to address → Extracted as Bill-to6 73. Missing Data:8 Handwritten notes not captured9 104. Format Issues:11 Date "01/02/24" → Jan 2 or Feb 1?Verification Workflow
1Confidence Score > 95%:2 → Auto-approve3 → Log for audit4 5Confidence Score 80-95%:6 → Flag for quick review7 → Human confirms/corrects8 9Confidence Score < 80%:10 → Manual processing required11 → Train model with correctionsBest Practice: Luôn có human review step cho documents quan trọng (contracts, financial). AI là assistant, không phải replacement.
🔐 Security Considerations
Data Privacy
1Questions to ask vendors:2✅ Data encrypted in transit & at rest?3✅ Where is data processed/stored?4✅ How long is data retained?5✅ Can data be deleted on request?6✅ SOC 2 / GDPR compliant?7✅ On-premise option available?Sensitive Documents
1For highly sensitive docs (medical, legal, financial):2 3Option 1: On-premise AI4 • Azure AI (private deployment)5 • Google Doc AI (VPC)6 7Option 2: Redaction first8 • Mask PII before processing9 • Process → Restore PII after10 11Option 3: Human-only processing12 • Some docs should never use cloud AI🎯 Bài tập thực hành
Task 1: Invoice Extraction (30 phút)
11. Sign up Docsumo free trial22. Upload 3 sample invoices33. Review extraction accuracy44. Export to Google Sheets55. Note any errors for trainingTask 2: Document Q&A (20 phút)
11. Upload a contract/report to Claude22. Ask 5 different questions:3 - Summary4 - Key dates5 - Monetary values6 - Obligations7 - Risks83. Verify accuracy against documentTask 3: Build Automation (30 phút)
11. Create Make.com scenario:2 - Trigger: Email with attachment3 - Extract: Key data from PDF4 - Store: Google Sheets5 - Notify: Slack message62. Test with sample document📚 Tổng kết
| Concept | Key Takeaway |
|---|---|
| Tools | Match tool to document type & volume |
| Extraction | AI handles 90%+, human verifies critical |
| Automation | Connect extraction to business workflows |
| Security | Consider data sensitivity before using cloud AI |
Tiếp theo: Bài 06 - Contract Analysis - Deep dive vào phân tích hợp đồng với AI!
