Lý thuyết
50 phút
Bài 5/15

Document AI Tools

Sử dụng AI để extraction, summarization và analysis documents tự động

Document AI Tools

Businesses xử lý hàng ngàn documents mỗi ngày. AI có thể automate extraction, summarization, và analysis - tiết kiệm hàng trăm giờ làm việc.

🎯 Mục tiêu bài học

  • Hiểu Document AI capabilities
  • Sử dụng các tools extraction phổ biến
  • Setup automated document workflows
  • Integrate với business processes

📄 Document AI có thể làm gì?

Capabilities Overview

Text
1┌─────────────────────────────────────────────┐
2│ DOCUMENT AI CAPABILITIES │
3├─────────────────────────────────────────────┤
4│ │
5│ 📥 EXTRACTION │
6│ • Text from images (OCR) │
7│ • Data from forms │
8│ • Tables from PDFs │
9│ • Key-value pairs │
10│ │
11│ 📝 SUMMARIZATION │
12│ • Long documents → Key points │
13│ • Meeting notes → Action items │
14│ • Reports → Executive summary │
15│ │
16│ 🔍 ANALYSIS │
17│ • Sentiment analysis │
18│ • Classification │
19│ • Comparison │
20│ • Risk identification │
21│ │
22│ 🔄 TRANSFORMATION │
23│ • Format conversion │
24│ • Translation │
25│ • Standardization │
26│ │
27└─────────────────────────────────────────────┘

ROI của Document AI

TaskManual TimeWith AISavings
Invoice processing15 min/doc30 sec97%
Contract review2 hours10 min92%
Resume screening10 min/CV1 min90%
Report summarization30 min2 min93%

Real Example: Công ty insurance xử lý 500 claims/ngày. Manual: 10 min/claim = 83 giờ/ngày. AI: 1 min/claim = 8 giờ/ngày. Tiết kiệm 75 giờ/ngày.

🛠️ Popular Document AI Tools

1. ChatGPT / Claude - General Purpose

Best for: Summarization, analysis, Q&A về documents

Text
1Upload PDF/Image → Ask questions
2
3Example prompts:
4• "Summarize this document in 5 bullet points"
5• "What are the key terms and conditions?"
6• "List all dates and deadlines mentioned"
7• "What risks are identified in this report?"

Limitations:

  • File size limits (ChatGPT: 512MB, Claude: 10MB per file)
  • No structured data export
  • Manual process (không auto)

2. Adobe Acrobat AI

Best for: PDF editing, summarization, Q&A

Text
1Features:
2✅ Summarize long PDFs
3✅ Ask questions about content
4✅ Generate citations
5✅ Works within Acrobat
6
7Pricing: Included with Acrobat Pro ($23/mo)

Use case: Legal teams reviewing contracts trong Acrobat workflow.

3. Docsumo - Invoice & Receipt Processing

Best for: Financial documents, invoices, receipts

Text
1Workflow:
21. Upload invoice (PDF, image, email)
32. AI extracts:
4 - Vendor name
5 - Invoice number
6 - Date
7 - Line items
8 - Total amount
9 - Tax
103. Review & approve
114. Export to accounting software
12
13Integrations: QuickBooks, Xero, SAP, NetSuite
14Pricing: From $99/mo (500 pages)

4. Nanonets - Forms & Tables

Best for: Structured data extraction from forms

Text
1Document Types:
2├── Invoices
3├── Purchase Orders
4├── Bank Statements
5├── ID Cards
6├── Medical Forms
7└── Custom Forms
8
9Features:
10• Pre-trained models
11• Custom model training
12• API access
13• Zapier integration

5. Parseur - Email & Attachment Parsing

Best for: Auto-extract từ emails và attachments

Text
1Use Cases:
2• Parse order confirmation emails
3• Extract booking details
4• Process lead form submissions
5• Handle support ticket emails
6
7Workflow:
8Email arrives → Parseur extracts →
9Send to Google Sheets/CRM/Database

6. Google Document AI

Best for: Enterprise, high-volume processing

Text
1Processors Available:
2├── OCR (general text)
3├── Form Parser
4├── Invoice Parser
5├── Receipt Parser
6├── ID Document Parser
7├── Contract Parser (preview)
8└── Custom Document Extractor
9
10Pricing: Pay-per-page ($0.001 - $0.10/page)

📊 Comparison Table

ToolBest ForEase of UsePriceVolume
ChatGPT/ClaudeAd-hoc analysis⭐⭐⭐⭐⭐$20/moLow
DocsumoInvoices⭐⭐⭐⭐$99/moMedium
NanonetsForms⭐⭐⭐⭐$99/moMedium
ParseurEmails⭐⭐⭐⭐⭐$39/moMedium
Google Doc AIEnterprise⭐⭐⭐Pay-per-useHigh

🔧 Hands-on: Document Extraction

Exercise 1: Invoice Data Extraction với Docsumo

Step 1: Setup Account

Text
11. Truy cập docsumo.com
22. Sign up free trial
33. Chọn "Invoice" template

Step 2: Upload Sample Invoice

Text
11. Dashboard → Upload Documents
22. Drag & drop invoice PDF
33. Wait for processing (~10 sec)

Step 3: Review Extraction

Text
1Extracted Fields:
2┌─────────────────────────────────┐
3│ Vendor: ABC Company │
4│ Invoice #: INV-2024-001 │
5│ Date: 15/01/2024 │
6│ Due Date: 15/02/2024 │
7│ │
8│ Line Items: │
9│ ├── Product A $100.00 │
10│ ├── Product B $250.00 │
11│ └── Service C $150.00 │
12│ │
13│ Subtotal: $500.00 │
14│ Tax (10%): $50.00 │
15│ Total: $550.00 │
16└─────────────────────────────────┘
17
18Confidence: 95%

Step 4: Export

Text
1Export Options:
2• JSON (for API/automation)
3• CSV (for spreadsheets)
4• Direct to QuickBooks
5• Webhook to custom endpoint

Exercise 2: Document Q&A với Claude

Upload Contract và Ask:

Text
1User: [Uploads employment contract PDF]
2 "What is the notice period for termination?"
3
4Claude: "Based on the contract:
5
6 **Notice Period:**
7 - Employee: 30 days written notice
8 - Employer: 60 days written notice
9 - During probation: 7 days for both parties
10
11 Reference: Section 8.2, Page 4"

More useful prompts:

Text
1📋 Extraction:
2"Extract all monetary values mentioned"
3"List all parties involved in this agreement"
4"What are the key dates and deadlines?"
5
6📝 Summarization:
7"Summarize in 3 paragraphs for executive review"
8"Create bullet points of key obligations"
9"What are the main risks in this document?"
10
11🔍 Analysis:
12"Compare with standard industry terms"
13"Identify any unusual clauses"
14"What's missing that should be included?"

🔄 Automated Document Workflows

Workflow 1: Invoice Processing Pipeline

Text
1┌─────────────┐ ┌─────────────┐ ┌─────────────┐
2│ Invoice │ │ Extract │ │ Review │
3│ Received │ ──► │ Data │ ──► │ Queue │
4│ (Email) │ │ (AI) │ │ │
5└─────────────┘ └─────────────┘ └──────┬──────┘
6
7 ┌─────────────┐ │
8 │ Accounting │ ◄──────────┘
9 │ Software │ Approved
10 └─────────────┘

Make.com Implementation:

Text
1Module 1: Email trigger (new invoice received)
2Module 2: Download attachment
3Module 3: Docsumo - Extract data
4Module 4: Google Sheets - Log entry
5Module 5: IF amount > $1000 → Notify manager
6Module 6: QuickBooks - Create bill

Workflow 2: Resume Screening

Text
1┌─────────────┐ ┌─────────────┐ ┌─────────────┐
2│ Resume │ │ Parse & │ │ Score & │
3│ Upload │ ──► │ Extract │ ──► │ Rank │
4│ │ │ │ │ │
5└─────────────┘ └─────────────┘ └──────┬──────┘
6
7 ┌───────────────────────────────────────┘
8
9
10┌─────────────┐ ┌─────────────┐
11│ Qualified │ │ Rejected │
12│ → ATS │ │ → Email │
13└─────────────┘ └─────────────┘

Extracted Fields:

Text
1• Name, Email, Phone
2• Education (degree, school, year)
3• Experience (company, role, duration)
4• Skills (technical, soft)
5• Certifications
6• Languages

Workflow 3: Contract Management

Text
1New Contract Upload
2
3
4┌─────────────────────┐
5│ Extract Key Info: │
6│ • Parties │
7│ • Effective date │
8│ • Expiration date │
9│ • Value │
10│ • Key terms │
11└──────────┬──────────┘
12
13
14┌─────────────────────┐
15│ Store in Database │
16│ Set Reminders: │
17│ • 30 days before │
18│ • 7 days before │
19│ • Renewal date │
20└─────────────────────┘

📋 Document Types & Best Tools

Financial Documents

Document TypeBest ToolKey Extractions
InvoicesDocsumo, NanonetsVendor, amount, line items
ReceiptsGoogle Doc AIDate, merchant, total
Bank StatementsNanonetsTransactions, balances
Tax FormsDocsumoIncome, deductions

HR Documents

Document TypeBest ToolKey Extractions
ResumesChatGPT, AffindaSkills, experience, education
ID CardsGoogle Doc AIName, DOB, ID number
Offer LettersClaudeSalary, start date, terms
TimesheetsNanonetsHours, dates, totals

Legal Documents

Document TypeBest ToolKey Extractions
ContractsClaude, Adobe AITerms, dates, obligations
NDAsChatGPTParties, scope, duration
LeasesGoogle Doc AIRent, term, conditions
PatentsClaudeClaims, prior art

⚠️ Accuracy & Verification

Common Extraction Errors

Text
11. OCR Errors (poor scan quality):
2 "Amount: $1,OOO" → Should be "$1,000"
3
42. Wrong Field Mapping:
5 Ship-to address → Extracted as Bill-to
6
73. Missing Data:
8 Handwritten notes not captured
9
104. Format Issues:
11 Date "01/02/24" → Jan 2 or Feb 1?

Verification Workflow

Text
1Confidence Score > 95%:
2 → Auto-approve
3 → Log for audit
4
5Confidence Score 80-95%:
6 → Flag for quick review
7 → Human confirms/corrects
8
9Confidence Score < 80%:
10 → Manual processing required
11 → Train model with corrections

Best Practice: Luôn có human review step cho documents quan trọng (contracts, financial). AI là assistant, không phải replacement.

🔐 Security Considerations

Data Privacy

Text
1Questions to ask vendors:
2✅ Data encrypted in transit & at rest?
3✅ Where is data processed/stored?
4✅ How long is data retained?
5✅ Can data be deleted on request?
6✅ SOC 2 / GDPR compliant?
7✅ On-premise option available?

Sensitive Documents

Text
1For highly sensitive docs (medical, legal, financial):
2
3Option 1: On-premise AI
4 • Azure AI (private deployment)
5 • Google Doc AI (VPC)
6
7Option 2: Redaction first
8 • Mask PII before processing
9 • Process → Restore PII after
10
11Option 3: Human-only processing
12 • Some docs should never use cloud AI

🎯 Bài tập thực hành

Task 1: Invoice Extraction (30 phút)

Text
11. Sign up Docsumo free trial
22. Upload 3 sample invoices
33. Review extraction accuracy
44. Export to Google Sheets
55. Note any errors for training

Task 2: Document Q&A (20 phút)

Text
11. Upload a contract/report to Claude
22. Ask 5 different questions:
3 - Summary
4 - Key dates
5 - Monetary values
6 - Obligations
7 - Risks
83. Verify accuracy against document

Task 3: Build Automation (30 phút)

Text
11. Create Make.com scenario:
2 - Trigger: Email with attachment
3 - Extract: Key data from PDF
4 - Store: Google Sheets
5 - Notify: Slack message
62. Test with sample document

📚 Tổng kết

ConceptKey Takeaway
ToolsMatch tool to document type & volume
ExtractionAI handles 90%+, human verifies critical
AutomationConnect extraction to business workflows
SecurityConsider data sensitivity before using cloud AI

Tiếp theo: Bài 06 - Contract Analysis - Deep dive vào phân tích hợp đồng với AI!