MinAI - Về trang chủ
Lý thuyết
8/1335 phút
Đang tải...

Query Pipeline

Xây dựng query pipeline hoàn chỉnh cho RAG trong n8n

🔎 Query Pipeline

0

🎯 Mục tiêu bài học

TB5 min

Sau bài học này, bạn sẽ:

✅ Hiểu kiến trúc query pipeline hoàn chỉnh

✅ Biết cách pre-process và enhance queries

✅ Nắm vững retrieval, re-ranking, và context building

✅ Implement answer generation với proper prompting

✅ Xử lý edge cases: no results, low confidence

Query pipeline là nơi user question được xử lý, tìm context, và tạo answer. Bài này cover full pipeline từ query đến response.

1

🏗️ Pipeline Architecture

TB5 min
Diagram
Đang vẽ diagram...

Checkpoint

Query pipeline gồm bao nhiêu bước chính? Liệt kê từ Query đến Final Answer.

2

🔧 Step 1: Query Pre-processing

TB5 min
JavaScript
1// Code node: Clean and enhance query
2function preprocessQuery(query) {
3 // Clean
4 let clean = query.trim();
5
6 // Expand abbreviations
7 const expansions = {
8 "API": "Application Programming Interface",
9 "RAG": "Retrieval Augmented Generation",
10 "LLM": "Large Language Model"
11 };
12
13 // Add context for better retrieval
14 return {
15 originalQuery: query,
16 cleanQuery: clean,
17 searchQuery: clean // Can add synonyms or rephrase
18 };
19}
20
21return { json: preprocessQuery($json.query) };

Query Enhancement with AI

JavaScript
1// OpenAI node: Enhance query for better retrieval
2const enhancePrompt = `
3Rephrase this query to improve search results.
4Generate 3 alternative phrasings that capture the same intent.
5
6Original query: "${$json.query}"
7
8Return JSON:
9{
10 "queries": ["original", "rephrasing1", "rephrasing2"]
11}`;

Checkpoint

Query pre-processing bao gồm những gì? Tại sao cần query enhancement với AI?

3

🔎 Step 2: Retrieve Relevant Chunks

TB5 min
JavaScript
1// Vector Store Search node
2// Top K: 10 (retrieve more, filter later)
3// Score Threshold: 0.6
4
5// With multi-query: search with each rephrased query
6// then deduplicate results
JavaScript
1// Code node: Deduplicate results from multi-query
2const allResults = $input.all();
3const seen = new Set();
4const unique = [];
5
6for (const result of allResults) {
7 const content = result.json.document.pageContent;
8 const hash = content.substring(0, 100); // Simple dedup
9
10 if (!seen.has(hash)) {
11 seen.add(hash);
12 unique.push(result);
13 }
14}
15
16// Sort by score descending
17unique.sort((a, b) => (b.json.score || 0) - (a.json.score || 0));
18
19// Keep top 5
20return unique.slice(0, 5);

Checkpoint

Tại sao cần deduplicate khi dùng multi-query? Top K ban đầu nên set bao nhiêu?

4

📊 Step 3: Re-ranking

TB5 min
JavaScript
1// OpenAI node: Re-rank results by relevance
2const rerankPrompt = `
3Given the question and search results, rank the results by relevance.
4
5Question: "${$json.query}"
6
7Results:
8${$json.results.map((r, i) => `[${i}] ${r.content}`).join('\n\n')}
9
10Return JSON array of indices ordered by relevance: [most_relevant_index, ...]
11Only include indices of chunks that actually help answer the question.`;

Checkpoint

Re-ranking là gì? Tại sao vector search results cần được re-rank?

5

🧠 Step 4: Build Context

TB5 min
JavaScript
1// Code node: Build optimized context
2const query = $json.query;
3const topResults = $json.rankedResults.slice(0, 5);
4
5const context = topResults.map((r, i) => {
6 const source = r.metadata.source || "unknown";
7 return `--- Source ${i + 1}: ${source} ---
8${r.content}`;
9}).join('\n\n');
10
11const systemPrompt = `You are a helpful assistant that answers questions
12based on the provided context.
13
14Rules:
151. Only use information from the context below
162. If the answer is not in the context, say "I could not find this information in the knowledge base"
173. Cite sources when possible: [Source 1], [Source 2]
184. Be concise but complete
195. Use Vietnamese for the response`;
20
21const userPrompt = `Context:
22${context}
23
24Question: ${query}
25
26Answer:`;
27
28return { json: { systemPrompt, userPrompt, sourceCount: topResults.length } };

Checkpoint

Context prompt cần bao gồm những rules nào? Tại sao cần cite sources?

6

⚡ Step 5: Generate Answer

TB5 min
JavaScript
1// OpenAI Chat node
2// System Message: {{ $json.systemPrompt }}
3// User Message: {{ $json.userPrompt }}
4// Model: gpt-4o-mini
5// Temperature: 0.3 (lower for factual answers)
6// Max Tokens: 500

Checkpoint

Temperature nên set bao nhiêu cho factual Q&A? Tại sao?

7

📋 Step 6: Post-processing

TB5 min
JavaScript
1// Code node: Format response
2const answer = $json.message.content;
3const sources = $json.sources;
4
5const response = {
6 answer: answer,
7 sources: sources.map(s => ({
8 title: s.metadata.source,
9 relevance: s.score
10 })),
11 confidence: sources[0]?.score || 0,
12 timestamp: new Date().toISOString()
13};
14
15return { json: response };

Checkpoint

Response cần bao gồm những fields nào sau post-processing?

8

🤖 Full Pipeline in n8n

TB5 min
Diagram
Đang vẽ diagram...

Checkpoint

Mô tả 7 nodes trong full pipeline workflow từ Webhook đến Respond to Webhook.

9

⚠️ Handling No Results

TB5 min
JavaScript
1// Code node: Check if results are relevant
2const results = $json.searchResults;
3const threshold = 0.7;
4
5const relevant = results.filter(r => r.score >= threshold);
6
7if (relevant.length === 0) {
8 return {
9 json: {
10 answer: "I could not find relevant information for your question.",
11 fallback: true,
12 suggestion: "Try rephrasing or ask a more specific question."
13 }
14 };
15}
16
17return { json: { results: relevant, fallback: false } };
Quality Tips
  • Temperature low: Dùng 0.1-0.3 cho factual Q and A
  • Top K tuning: Bắt đầu với 5, tăng nếu missing context
  • Score threshold: 0.7 là good starting point
  • Cite sources: Luôn reference sources trong answer

Checkpoint

Khi không tìm thấy results relevant, system nên respond như thế nào?

10

📝 Bài tập thực hành

TB5 min
Exercises
  1. Build complete query pipeline với 6 steps
  2. Implement multi-query enhancement
  3. Add no-result handling với fallback messages
  4. Test với 10 queries, measure answer quality

Checkpoint

Liệt kê 4 exercises cần hoàn thành. Exercise nào challenging nhất?

🚀 Bài tiếp theo

Context Management → — Quản lý context window, conversation history, và memory.