Lý thuyết
35 phút
Bài 1/3

RAG Fundamentals

Tìm hiểu RAG (Retrieval-Augmented Generation) và cách implement trong n8n

📚 RAG Fundamentals

RAG cho phép AI trả lời dựa trên dữ liệu của bạn - documents, knowledge bases, databases.

RAG là gì?

RAG Definition

Retrieval-Augmented Generation kết hợp:

  • Retrieval: Tìm thông tin liên quan từ knowledge base
  • Generation: Dùng LLM để tạo response dựa trên thông tin đó
Diagram
graph LR
    Q[User Query] --> E[Embed Query]
    E --> S[Search Vector DB]
    S --> R[Relevant Chunks]
    R --> C[Context + Query]
    C --> L[LLM]
    L --> A[Answer]

Tại sao cần RAG?

LLM Limitations

ProblemRAG Solution
Knowledge cutoffReal-time data access
HallucinationsGrounded in documents
No private dataAccess your docs
Generic answersSpecific context

Use Cases

  • Internal wiki chatbot: Answer questions about company docs
  • Customer support: Product documentation Q&A
  • Legal/compliance: Search contracts, policies
  • Research: Query academic papers

Core Concepts

1. Embeddings

Convert text thành vectors (numbers):

Text
1"Machine learning is amazing"
2 → [0.12, -0.34, 0.78, ...] (1536 dimensions)

Similar texts → Similar vectors

2. Vector Database

Store và search embeddings:

DatabaseTypeFeatures
PineconeCloudManaged, scalable
Supabase VectorCloudPostgreSQL-based
QdrantSelf-hostedFast, efficient
ChromaDBLocalEasy setup

3. Chunking

Split documents thành smaller pieces:

Text
1Long Document (10,000 words)
2 → Chunk 1 (500 words)
3 → Chunk 2 (500 words)
4 → ...
5 → Chunk 20 (500 words)

4. Similarity Search

Find most relevant chunks:

Text
1Query: "What's the return policy?"
2
3Results (by similarity):
41. "Returns accepted within 30 days..." (0.92)
52. "For refunds, customers must..." (0.87)
63. "Shipping policy states..." (0.65)

RAG in n8n

Available Nodes

  • Document Loaders: PDF, web, text files
  • Text Splitters: Chunk documents
  • Embeddings: OpenAI, Hugging Face
  • Vector Stores: Pinecone, Supabase, Qdrant
  • Retrievers: Search vector stores

Basic RAG Workflow

Diagram
graph TD
    subgraph "Indexing (One-time)"
        D[Documents] --> L[Load]
        L --> S[Split]
        S --> E[Embed]
        E --> V[(Vector Store)]
    end
    
    subgraph "Query (Each request)"
        Q[Query] --> QE[Embed Query]
        QE --> R[Retrieve]
        R --> |Top K chunks| C[Combine]
        C --> LLM[Generate]
        LLM --> A[Answer]
    end
    
    V --> R

Implementation

Step 1: Document Indexing

Workflow: Index Documents

Text
1File Trigger (new file)
2
3Document Loader (PDF/Text)
4
5Text Splitter
6
7Embeddings (OpenAI)
8
9Vector Store (Insert)

Text Splitter Config:

JavaScript
1Chunk Size: 500 // characters
2Chunk Overlap: 50 // overlap between chunks
3Separator: "\n\n" // split on paragraphs

Embeddings Node:

JavaScript
1Model: text-embedding-3-small // cheaper
2// or text-embedding-3-large // better quality

Step 2: Query Pipeline

Workflow: Answer Questions

Text
1Webhook (question)
2
3Embeddings (query)
4
5Vector Store (search)
6
7Format Context
8
9OpenAI (generate)
10
11Return Answer

Vector Store Search:

JavaScript
1Top K: 5 // retrieve 5 most similar
2Similarity Threshold: 0.7

Format Context:

JavaScript
1const chunks = $input.all();
2
3const context = chunks
4 .map(c => c.json.text)
5 .join('\n\n---\n\n');
6
7return [{
8 json: {
9 context,
10 question: $json.question
11 }
12}];

OpenAI Prompt:

JavaScript
1System: `You are a helpful assistant. Answer questions based ONLY on the provided context. If the answer is not in the context, say "I don't have that information."
2
3Context:
4${context}
5`
6
7User: `Question: ${question}`

Advanced RAG Techniques

1. Metadata Filtering

Add metadata to chunks:

JavaScript
1{
2 text: "Return policy content...",
3 metadata: {
4 source: "policy.pdf",
5 category: "returns",
6 date: "2024-01-15"
7 }
8}
9
10// Query with filter
11Filter: { category: "returns" }

2. Hybrid Search

Combine vector search + keyword search:

JavaScript
1// Vector search: semantic similarity
2// Keyword search: exact matches
3
4// Combine results với weighted scoring

3. Re-ranking

Use another model to re-rank results:

JavaScript
1// After vector search
2Chunks: [A, B, C, D, E]
3
4// Re-rank với cross-encoder
5Re-ranked: [C, A, E, B, D] // C is most relevant

n8n Vector Store Nodes

Pinecone

JavaScript
1// Setup
2Index Name: my-knowledge-base
3Namespace: company-docs
4Dimensions: 1536 // OpenAI embeddings
5
6// Insert
7await pinecone.upsert([{
8 id: "doc-123",
9 values: embeddings,
10 metadata: { source: "policy.pdf" }
11}]);
12
13// Query
14const results = await pinecone.query({
15 vector: queryEmbedding,
16 topK: 5,
17 includeMetadata: true
18});

Supabase Vector

JavaScript
1// Uses pgvector extension
2// Integrated với Supabase Auth/RLS
3
4// Insert
5await supabase
6 .from('documents')
7 .insert({
8 content: chunk.text,
9 embedding: chunk.vector,
10 metadata: chunk.metadata
11 });
12
13// Query
14const { data } = await supabase.rpc('match_documents', {
15 query_embedding: queryVector,
16 match_count: 5
17});

Best Practices

RAG Tips
  1. Chunk size matters - Too small = no context, too large = noise
  2. Overlap chunks - Preserve context across splits
  3. Include sources - Show where info came from
  4. Test queries - Evaluate retrieval quality
  5. Update regularly - Keep knowledge base current
  6. Handle no-results - Graceful fallback

Evaluation

Metrics

  • Retrieval: Are correct chunks retrieved?
  • Generation: Is answer accurate and helpful?
  • Relevance: Does answer address the question?
JavaScript
1// Simple evaluation
2const isRelevant = answer.includes(expectedInfo);
3const isAccurate = !answer.includes("I don't know");
4const hasSource = answer.includes("According to");

Bài tập thực hành

Hands-on Exercise

Build Company Wiki Chatbot:

  1. Collect 5-10 documents (policies, FAQs)
  2. Create indexing workflow
  3. Create query workflow
  4. Test với various questions
  5. Evaluate answer quality

Target: Chatbot trả lời chính xác từ documents


Tiếp theo

Bài tiếp theo: Vector Database Setup - Deep dive vào Pinecone/Supabase.