🔗 n8n Vector Store Nodes

🎯 Mục tiêu bài học

TB5 min

Sau bài học này, bạn sẽ:

✅ Hiểu Vector Store Workflow Pattern trong n8n

✅ Biết cách build Insert Documents Workflow

✅ Nắm vững Search Workflow và context building

✅ Xây dựng RAG Agent với Vector Store Retriever

✅ Implement document update và monitoring strategy

n8n cung cấp native Vector Store nodes cho insert, search, và manage vector data. Bài này cover cách sử dụng chi tiết.

Task 0

🔄 Vector Store Workflow Pattern

TB5 min

Diagram

Đang vẽ diagram...

Checkpoint

Mô tả hai flow chính (Indexing và Query) trong Vector Store Workflow Pattern.

Task 1

📥 Insert Documents Workflow

TB5 min

Diagram

Đang vẽ diagram...

Text Splitter Configuration

JavaScript

1// Recursive Character Text Splitter
2// Configuration in n8n:
3// - Chunk Size: 1000 (characters)
4// - Chunk Overlap: 200 (characters)
5// - Separators: ["\n\n", "\n", " ", ""]
6
7// Best practices:
8// Chunk size 500-1500 chars for most docs
9// Overlap 10-20% of chunk size
10// Larger chunks = more context, less precision
11// Smaller chunks = more precision, less context

Metadata Enrichment

JavaScript

1// Code node: Add metadata before indexing
2const chunks = $input.all();
3
4return chunks.map((chunk, index) => ({
5  json: {
6    content: chunk.json.text,
7    metadata: {
8      source: chunk.json.source || "unknown",
9      page: chunk.json.page || 0,
10      chunkIndex: index,
11      documentTitle: chunk.json.title || "",
12      indexedAt: new Date().toISOString()
13    }
14  }
15}));

Checkpoint

Text Splitter nên config chunk size và overlap như thế nào? Metadata cần bao gồm những gì?

Task 2

🔎 Search Workflow

TB5 min

Diagram

Đang vẽ diagram...

Search Configuration

JavaScript

1// Vector Store Search node
2// Configuration:
3// - Top K: 5 (number of results)
4// - Score Threshold: 0.7 (minimum similarity)
5// - Filter: metadata-based filtering
6
7// Search with metadata filter
8// Filter by source document:
9// metadata.source == "company-handbook"
10
11// Filter by date:
12// metadata.indexedAt > "2025-01-01"

Context Building

JavaScript

1// Code node: Build context from search results
2const results = $input.all();
3
4const context = results.map((r, i) => {
5  const sim = r.json.score ? r.json.score.toFixed(2) : "N/A";
6  return `[Source ${i + 1}] (similarity: ${sim})
7${r.json.document.pageContent}
8---`;
9}).join("\n\n");
10
11const prompt = `
12Based on the following context, answer the user's question.
13If the context doesn't contain the answer, say "I don't have enough information."
14
15Context:
16${context}
17
18Question: ${$json.query}
19
20Answer:`;
21
22return { json: { prompt, resultCount: results.length } };

Checkpoint

Cách build context từ search results. Top K và Score Threshold nên set bao nhiêu?

Task 3

🤖 RAG Agent with Vector Store

TB5 min

Diagram

Đang vẽ diagram...

JavaScript

1// AI Agent node configuration:
2// - Agent Type: Tools Agent
3// - LLM: OpenAI GPT-4o-mini
4// - Tools: Vector Store Retriever
5// - System Message:
6
7const systemMessage = `
8You are a helpful assistant that answers questions 
9based on the company knowledge base.
10
11Rules:
121. Only answer based on the retrieved context
132. If you are not sure, say so
143. Cite the source when possible
154. Be concise but complete
16`;

Checkpoint

Mô tả cách config AI Agent node với Vector Store Retriever tool. System message cần gì?

Task 4

🔄 Document Update Strategy

TB5 min

JavaScript

1// Workflow: Update documents when source changes
2// Google Drive Trigger (on file change) → Re-index
3
4// Code node: Upsert logic
5const documentId = $json.fileId;
6
7// Step 1: Delete old chunks for this document
8// Vector Store: Delete where metadata.sourceId == documentId
9
10// Step 2: Re-process and insert new chunks
11// Load → Split → Embed → Insert with metadata.sourceId = documentId

Checkpoint

Giải thích quy trình update documents khi source file thay đổi.

Task 5

📊 Monitoring Vector Store

TB5 min

JavaScript

1// Code node: Track indexing metrics
2const metrics = {
3  timestamp: new Date().toISOString(),
4  operation: $json.operation, // insert, search, delete
5  documentCount: $json.count,
6  processingTime: $json.duration,
7  vectorCount: $json.totalVectors
8};
9
10// Save to Google Sheets for tracking
11return { json: metrics };

Performance Tips

Batch insert: Group documents, don't insert one-by-one
Namespace: Dùng namespaces/collections để organize data
Metadata: Luôn attach metadata cho filtering sau này
Cleanup: Regularly delete outdated chunks

Checkpoint

Cần track những metrics nào để monitoring vector store operations?

Task 6

📝 Bài tập thực hành

TB5 min

Exercises

Build insert workflow: Load 10 PDFs, split, index to Vector Store
Build search workflow: Query và return top 5 results
Create RAG Agent voi retriever tool
Implement document update workflow khi file thay đổi

Checkpoint

Liệt kê 4 exercises cần hoàn thành trong bài này.

Task 7

🚀 Bài tiếp theo

Text Splitting Strategies → — Các chiến lược split text hiệu quả cho RAG systems.

n8n Vector Store Nodes

🔗 n8n Vector Store Nodes

🎯 Mục tiêu bài học

🔄 Vector Store Workflow Pattern

Checkpoint

📥 Insert Documents Workflow

Text Splitter Configuration

Metadata Enrichment

Checkpoint

🔎 Search Workflow

Search Configuration

Context Building

Checkpoint

🤖 RAG Agent with Vector Store

Checkpoint

🔄 Document Update Strategy

Checkpoint

📊 Monitoring Vector Store

Checkpoint

📝 Bài tập thực hành

Checkpoint

🚀 Bài tiếp theo

Khóa học

Mentor & Hỗ trợ

Blog

Giới thiệu