MinAI - Về trang chủ
Lý thuyết
5/1335 phút
Đang tải...

n8n Vector Store Nodes

Hướng dẫn chi tiết sử dụng Vector Store nodes trong n8n

🔗 n8n Vector Store Nodes

0

🎯 Mục tiêu bài học

TB5 min

Sau bài học này, bạn sẽ:

✅ Hiểu Vector Store Workflow Pattern trong n8n

✅ Biết cách build Insert Documents Workflow

✅ Nắm vững Search Workflow và context building

✅ Xây dựng RAG Agent với Vector Store Retriever

✅ Implement document update và monitoring strategy

n8n cung cấp native Vector Store nodes cho insert, search, và manage vector data. Bài này cover cách sử dụng chi tiết.

1

🔄 Vector Store Workflow Pattern

TB5 min
Diagram
Đang vẽ diagram...

Checkpoint

Mô tả hai flow chính (Indexing và Query) trong Vector Store Workflow Pattern.

2

📥 Insert Documents Workflow

TB5 min
Diagram
Đang vẽ diagram...

Text Splitter Configuration

JavaScript
1// Recursive Character Text Splitter
2// Configuration in n8n:
3// - Chunk Size: 1000 (characters)
4// - Chunk Overlap: 200 (characters)
5// - Separators: ["\n\n", "\n", " ", ""]
6
7// Best practices:
8// Chunk size 500-1500 chars for most docs
9// Overlap 10-20% of chunk size
10// Larger chunks = more context, less precision
11// Smaller chunks = more precision, less context

Metadata Enrichment

JavaScript
1// Code node: Add metadata before indexing
2const chunks = $input.all();
3
4return chunks.map((chunk, index) => ({
5 json: {
6 content: chunk.json.text,
7 metadata: {
8 source: chunk.json.source || "unknown",
9 page: chunk.json.page || 0,
10 chunkIndex: index,
11 documentTitle: chunk.json.title || "",
12 indexedAt: new Date().toISOString()
13 }
14 }
15}));

Checkpoint

Text Splitter nên config chunk size và overlap như thế nào? Metadata cần bao gồm những gì?

3

🔎 Search Workflow

TB5 min
Diagram
Đang vẽ diagram...

Search Configuration

JavaScript
1// Vector Store Search node
2// Configuration:
3// - Top K: 5 (number of results)
4// - Score Threshold: 0.7 (minimum similarity)
5// - Filter: metadata-based filtering
6
7// Search with metadata filter
8// Filter by source document:
9// metadata.source == "company-handbook"
10
11// Filter by date:
12// metadata.indexedAt > "2025-01-01"

Context Building

JavaScript
1// Code node: Build context from search results
2const results = $input.all();
3
4const context = results.map((r, i) => {
5 const sim = r.json.score ? r.json.score.toFixed(2) : "N/A";
6 return `[Source ${i + 1}] (similarity: ${sim})
7${r.json.document.pageContent}
8---`;
9}).join("\n\n");
10
11const prompt = `
12Based on the following context, answer the user's question.
13If the context doesn't contain the answer, say "I don't have enough information."
14
15Context:
16${context}
17
18Question: ${$json.query}
19
20Answer:`;
21
22return { json: { prompt, resultCount: results.length } };

Checkpoint

Cách build context từ search results. Top K và Score Threshold nên set bao nhiêu?

4

🤖 RAG Agent with Vector Store

TB5 min
Diagram
Đang vẽ diagram...
JavaScript
1// AI Agent node configuration:
2// - Agent Type: Tools Agent
3// - LLM: OpenAI GPT-4o-mini
4// - Tools: Vector Store Retriever
5// - System Message:
6
7const systemMessage = `
8You are a helpful assistant that answers questions
9based on the company knowledge base.
10
11Rules:
121. Only answer based on the retrieved context
132. If you are not sure, say so
143. Cite the source when possible
154. Be concise but complete
16`;

Checkpoint

Mô tả cách config AI Agent node với Vector Store Retriever tool. System message cần gì?

5

🔄 Document Update Strategy

TB5 min
JavaScript
1// Workflow: Update documents when source changes
2// Google Drive Trigger (on file change) → Re-index
3
4// Code node: Upsert logic
5const documentId = $json.fileId;
6
7// Step 1: Delete old chunks for this document
8// Vector Store: Delete where metadata.sourceId == documentId
9
10// Step 2: Re-process and insert new chunks
11// Load → Split → Embed → Insert with metadata.sourceId = documentId

Checkpoint

Giải thích quy trình update documents khi source file thay đổi.

6

📊 Monitoring Vector Store

TB5 min
JavaScript
1// Code node: Track indexing metrics
2const metrics = {
3 timestamp: new Date().toISOString(),
4 operation: $json.operation, // insert, search, delete
5 documentCount: $json.count,
6 processingTime: $json.duration,
7 vectorCount: $json.totalVectors
8};
9
10// Save to Google Sheets for tracking
11return { json: metrics };
Performance Tips
  • Batch insert: Group documents, don't insert one-by-one
  • Namespace: Dùng namespaces/collections để organize data
  • Metadata: Luôn attach metadata cho filtering sau này
  • Cleanup: Regularly delete outdated chunks

Checkpoint

Cần track những metrics nào để monitoring vector store operations?

7

📝 Bài tập thực hành

TB5 min
Exercises
  1. Build insert workflow: Load 10 PDFs, split, index to Vector Store
  2. Build search workflow: Query và return top 5 results
  3. Create RAG Agent voi retriever tool
  4. Implement document update workflow khi file thay đổi

Checkpoint

Liệt kê 4 exercises cần hoàn thành trong bài này.

🚀 Bài tiếp theo

Text Splitting Strategies → — Các chiến lược split text hiệu quả cho RAG systems.