🔗 n8n Vector Store Nodes
🎯 Mục tiêu bài học
Sau bài học này, bạn sẽ:
✅ Hiểu Vector Store Workflow Pattern trong n8n
✅ Biết cách build Insert Documents Workflow
✅ Nắm vững Search Workflow và context building
✅ Xây dựng RAG Agent với Vector Store Retriever
✅ Implement document update và monitoring strategy
n8n cung cấp native Vector Store nodes cho insert, search, và manage vector data. Bài này cover cách sử dụng chi tiết.
🔄 Vector Store Workflow Pattern
Checkpoint
Mô tả hai flow chính (Indexing và Query) trong Vector Store Workflow Pattern.
📥 Insert Documents Workflow
Text Splitter Configuration
1// Recursive Character Text Splitter2// Configuration in n8n:3// - Chunk Size: 1000 (characters)4// - Chunk Overlap: 200 (characters)5// - Separators: ["\n\n", "\n", " ", ""]67// Best practices:8// Chunk size 500-1500 chars for most docs9// Overlap 10-20% of chunk size10// Larger chunks = more context, less precision11// Smaller chunks = more precision, less contextMetadata Enrichment
1// Code node: Add metadata before indexing2const chunks = $input.all();34return chunks.map((chunk, index) => ({5 json: {6 content: chunk.json.text,7 metadata: {8 source: chunk.json.source || "unknown",9 page: chunk.json.page || 0,10 chunkIndex: index,11 documentTitle: chunk.json.title || "",12 indexedAt: new Date().toISOString()13 }14 }15}));Checkpoint
Text Splitter nên config chunk size và overlap như thế nào? Metadata cần bao gồm những gì?
🔎 Search Workflow
Search Configuration
1// Vector Store Search node2// Configuration:3// - Top K: 5 (number of results)4// - Score Threshold: 0.7 (minimum similarity)5// - Filter: metadata-based filtering67// Search with metadata filter8// Filter by source document:9// metadata.source == "company-handbook"1011// Filter by date:12// metadata.indexedAt > "2025-01-01"Context Building
1// Code node: Build context from search results2const results = $input.all();34const context = results.map((r, i) => {5 const sim = r.json.score ? r.json.score.toFixed(2) : "N/A";6 return `[Source ${i + 1}] (similarity: ${sim})7${r.json.document.pageContent}8---`;9}).join("\n\n");1011const prompt = `12Based on the following context, answer the user's question.13If the context doesn't contain the answer, say "I don't have enough information."1415Context:16${context}1718Question: ${$json.query}1920Answer:`;2122return { json: { prompt, resultCount: results.length } };Checkpoint
Cách build context từ search results. Top K và Score Threshold nên set bao nhiêu?
🤖 RAG Agent with Vector Store
1// AI Agent node configuration:2// - Agent Type: Tools Agent3// - LLM: OpenAI GPT-4o-mini4// - Tools: Vector Store Retriever5// - System Message:67const systemMessage = `8You are a helpful assistant that answers questions 9based on the company knowledge base.1011Rules:121. Only answer based on the retrieved context132. If you are not sure, say so143. Cite the source when possible154. Be concise but complete16`;Checkpoint
Mô tả cách config AI Agent node với Vector Store Retriever tool. System message cần gì?
🔄 Document Update Strategy
1// Workflow: Update documents when source changes2// Google Drive Trigger (on file change) → Re-index34// Code node: Upsert logic5const documentId = $json.fileId;67// Step 1: Delete old chunks for this document8// Vector Store: Delete where metadata.sourceId == documentId910// Step 2: Re-process and insert new chunks11// Load → Split → Embed → Insert with metadata.sourceId = documentIdCheckpoint
Giải thích quy trình update documents khi source file thay đổi.
📊 Monitoring Vector Store
1// Code node: Track indexing metrics2const metrics = {3 timestamp: new Date().toISOString(),4 operation: $json.operation, // insert, search, delete5 documentCount: $json.count,6 processingTime: $json.duration,7 vectorCount: $json.totalVectors8};910// Save to Google Sheets for tracking11return { json: metrics };- Batch insert: Group documents, don't insert one-by-one
- Namespace: Dùng namespaces/collections để organize data
- Metadata: Luôn attach metadata cho filtering sau này
- Cleanup: Regularly delete outdated chunks
Checkpoint
Cần track những metrics nào để monitoring vector store operations?
📝 Bài tập thực hành
- Build insert workflow: Load 10 PDFs, split, index to Vector Store
- Build search workflow: Query và return top 5 results
- Create RAG Agent voi retriever tool
- Implement document update workflow khi file thay đổi
Checkpoint
Liệt kê 4 exercises cần hoàn thành trong bài này.
🚀 Bài tiếp theo
Text Splitting Strategies → — Các chiến lược split text hiệu quả cho RAG systems.
