🎯 Mục tiêu bài học
User queries thường ngắn, mơ hồ, hoặc không match với document language. Query enhancement cải thiện retrieval quality đáng kể.
Sau bài này, bạn sẽ:
✅ HyDE (Hypothetical Document Embeddings) ✅ Multi-Query expansion ✅ Step-Back Prompting ✅ Query routing & decomposition
🔍 The Query Problem
1User query: "lương tối thiểu"2Document content: "Mức lương tối thiểu vùng theo Nghị định 38/2022/NĐ-CP3 quy định mức lương tối thiểu tháng và mức lương tối thiểu4 giờ đối với người lao động..."5 6Problem: Short query may not match well with detailed document text7Solution: Enhance the query before searchingEnhancement Strategies Overview
| Strategy | What it does | When to use |
|---|---|---|
| HyDE | Generate hypothetical answer, search with it | General queries |
| Multi-Query | Generate multiple query variants | Ambiguous queries |
| Step-Back | Ask broader question first | Specific questions |
| Decomposition | Break complex query into sub-queries | Multi-part questions |
| Query Rewriting | Rephrase for better matching | Conversational queries |
Checkpoint
Bạn đã hiểu tại sao query enhancement quan trọng cho RAG retrieval chưa?
📐 HyDE (Hypothetical Document Embeddings)
Concept
1Traditional: query → embed → search2HyDE: query → LLM generates hypothetical answer → embed answer → search3 4Why it works: The hypothetical answer is closer in embedding space5to the actual document than the short queryImplementation
1from langchain_openai import ChatOpenAI, OpenAIEmbeddings2from langchain_core.prompts import ChatPromptTemplate3from langchain_core.output_parsers import StrOutputParser45llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)6embeddings = OpenAIEmbeddings(model="text-embedding-3-small")78# Step 1: Generate hypothetical document9hyde_prompt = ChatPromptTemplate.from_template(10 """Given the question below, write a detailed paragraph that would11 answer this question. Write as if it's from an official document.12 13 Question: {question}14 15 Hypothetical answer:"""16)1718hyde_chain = hyde_prompt | llm | StrOutputParser()1920# Step 2: Use hypothetical answer for search21def hyde_search(query, vectorstore, k=5):22 # Generate hypothetical answer23 hypothetical_doc = hyde_chain.invoke({"question": query})24 print(f"HyDE document: {hypothetical_doc[:200]}...")25 26 # Search with hypothetical document embedding27 results = vectorstore.similarity_search(hypothetical_doc, k=k)28 return results2930# Usage31results = hyde_search("lương tối thiểu 2024", vectorstore)Checkpoint
Bạn đã hiểu HyDE technique tạo hypothetical answer để search thay vì query gốc chưa?
📐 Multi-Query Expansion
Concept
1Original query: "RAG performance optimization"2 ↓3 LLM generates variants:4 ↓5Query 1: "How to improve RAG retrieval accuracy?"6Query 2: "Techniques for faster RAG response time"7Query 3: "RAG evaluation metrics and benchmarks"8 ↓9 Search each → Merge & deduplicate resultsImplementation
1from langchain.retrievers import MultiQueryRetriever23# Auto-generate query variants4multiquery_retriever = MultiQueryRetriever.from_llm(5 retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),6 llm=ChatOpenAI(model="gpt-4o-mini", temperature=0.3)7)89# Search with multiple queries automatically10results = multiquery_retriever.invoke("RAG performance optimization")11print(f"Retrieved {len(results)} unique documents")Custom Multi-Query
1from langchain_core.prompts import ChatPromptTemplate23multi_query_prompt = ChatPromptTemplate.from_template(4 """You are an AI language model assistant. Your task is to generate 35 different versions of the given user question to retrieve relevant documents.6 7 Provide these alternative questions separated by newlines.8 9 Original question: {question}10 11 Alternative questions:"""12)1314def custom_multi_query(query, vectorstore, k=3):15 # Generate alternative queries16 response = (multi_query_prompt | llm | StrOutputParser()).invoke({"question": query})17 alt_queries = [q.strip() for q in response.strip().split("\n") if q.strip()]18 19 all_queries = [query] + alt_queries20 print(f"Searching with {len(all_queries)} queries:")21 for q in all_queries:22 print(f" - {q}")23 24 # Search each query25 all_docs = []26 seen_contents = set()27 for q in all_queries:28 docs = vectorstore.similarity_search(q, k=k)29 for doc in docs:30 content_hash = hash(doc.page_content)31 if content_hash not in seen_contents:32 seen_contents.add(content_hash)33 all_docs.append(doc)34 35 print(f"Total unique documents: {len(all_docs)}")36 return all_docsCheckpoint
Bạn đã hiểu Multi-Query technique tạo query variants để tăng recall chưa?
📐 Step-Back Prompting
Concept
1Original: "What is the chunking strategy for Vietnamese legal documents?"2 ↓3Step-back: "What are the best practices for document chunking in RAG?"4 ↓5 Search with both queries → combine results6 7Why: Step-back question retrieves broader, foundational contextImplementation
1stepback_prompt = ChatPromptTemplate.from_template(2 """You are an expert at generating step-back questions.3 Given a specific question, generate a more general question that4 would help retrieve background information.5 6 Original: {question}7 Step-back question:"""8)910def stepback_search(query, vectorstore, k=3):11 # Generate step-back question12 stepback_q = (stepback_prompt | llm | StrOutputParser()).invoke({"question": query})13 print(f"Step-back: {stepback_q}")14 15 # Search with both16 original_docs = vectorstore.similarity_search(query, k=k)17 stepback_docs = vectorstore.similarity_search(stepback_q, k=k)18 19 # Combine and deduplicate20 all_docs = original_docs + stepback_docs21 unique_docs = list({doc.page_content: doc for doc in all_docs}.values())22 23 return unique_docsCheckpoint
Bạn đã hiểu Step-Back Prompting lấy broader context trước khi search cụ thể chưa?
📐 Query Decomposition
Break Complex Queries
1decomposition_prompt = ChatPromptTemplate.from_template(2 """Break down this complex question into 2-4 simpler sub-questions3 that can be answered independently.4 5 Complex question: {question}6 7 Sub-questions (one per line):"""8)910def decompose_and_search(query, vectorstore, k=3):11 # Decompose12 response = (decomposition_prompt | llm | StrOutputParser()).invoke({"question": query})13 sub_queries = [q.strip().lstrip("0123456789.-) ") for q in response.strip().split("\n") if q.strip()]14 15 print(f"Decomposed into {len(sub_queries)} sub-queries:")16 17 all_context = []18 for i, sub_q in enumerate(sub_queries):19 print(f" {i+1}. {sub_q}")20 docs = vectorstore.similarity_search(sub_q, k=k)21 context = "\n".join([d.page_content for d in docs])22 all_context.append(f"Sub-question: {sub_q}\nContext: {context}")23 24 return "\n\n---\n\n".join(all_context)2526# Usage27complex_query = "So sánh lương tối thiểu vùng 1 và vùng 4, và cách tính bảo hiểm dựa trên lương"28context = decompose_and_search(complex_query, vectorstore)Checkpoint
Bạn đã hiểu cách chia complex query thành sub-queries để search từng phần chưa?
🛠️ Query Router
Route to Appropriate Source
1from langchain_core.prompts import ChatPromptTemplate2from langchain_core.output_parsers import StrOutputParser34router_prompt = ChatPromptTemplate.from_template(5 """Given the user question, classify which knowledge base to search.6 7 Available knowledge bases:8 - "hr_policy": HR policies, leave, benefits, workplace rules9 - "technical": Technical documentation, APIs, system guides10 - "legal": Legal regulations, compliance, contracts11 - "general": General company information12 13 Question: {question}14 15 Answer with just the knowledge base name:"""16)1718def route_query(query, vectorstores):19 """Route query to appropriate vector store."""20 kb_name = (router_prompt | llm | StrOutputParser()).invoke({"question": query})21 kb_name = kb_name.strip().lower()22 23 if kb_name in vectorstores:24 print(f"Routing to: {kb_name}")25 return vectorstores[kb_name].similarity_search(query, k=5)26 else:27 print(f"Unknown KB '{kb_name}', searching all")28 all_results = []29 for vs in vectorstores.values():30 all_results.extend(vs.similarity_search(query, k=2))31 return all_results3233# Usage34vectorstores = {35 "hr_policy": hr_vectorstore,36 "technical": tech_vectorstore,37 "legal": legal_vectorstore38}39docs = route_query("Chính sách nghỉ phép năm", vectorstores)Checkpoint
Bạn đã hiểu cách route queries tới knowledge base phù hợp chưa?
💻 Combining Strategies
1class EnhancedRetriever:2 """Combine multiple query enhancement strategies."""3 4 def __init__(self, vectorstore, llm):5 self.vectorstore = vectorstore6 self.llm = llm7 8 def retrieve(self, query, strategy="auto", k=5):9 if strategy == "auto":10 strategy = self._detect_strategy(query)11 12 print(f"Using strategy: {strategy}")13 14 if strategy == "hyde":15 return self._hyde_search(query, k)16 elif strategy == "multi_query":17 return self._multi_query_search(query, k)18 elif strategy == "stepback":19 return self._stepback_search(query, k)20 elif strategy == "decompose":21 return self._decompose_search(query, k)22 else:23 return self.vectorstore.similarity_search(query, k=k)24 25 def _detect_strategy(self, query):26 """Auto-detect best strategy based on query."""27 word_count = len(query.split())28 29 if " và " in query or " so sánh " in query.lower():30 return "decompose"31 elif word_count <= 3:32 return "hyde"33 elif "?" in query:34 return "multi_query"35 else:36 return "stepback"3738# Usage39retriever = EnhancedRetriever(vectorstore, llm)40docs = retriever.retrieve("lương tối thiểu", strategy="auto")Checkpoint
Bạn đã hiểu cách kết hợp và tự động chọn query enhancement strategy chưa?
🎯 Tổng kết
📝 Quiz
-
HyDE technique làm gì?
- Tạo database mới
- LLM generate hypothetical answer, dùng embedding của answer để search
- Ẩn query khỏi user
- Tạo fake documents
-
Multi-Query phù hợp khi nào?
- Query mơ hồ, có thể diễn đạt nhiều cách
- Query đã rất cụ thể
- Không bao giờ nên dùng
- Chỉ cho English
-
Query decomposition giúp gì?
- Chia câu hỏi phức tạp thành sub-questions, search mỗi phần riêng
- Xóa bớt từ trong query
- Tạo query ngắn hơn
- Translate query
Key Takeaways
- HyDE — Best cho short/vague queries
- Multi-Query — Increase recall với query variants
- Step-Back — Get broader context first
- Decomposition — Handle complex multi-part questions
- Auto-routing — Choose strategy based on query type
Câu hỏi tự kiểm tra
- HyDE (Hypothetical Document Embeddings) hoạt động như thế nào và phù hợp với loại query nào?
- Multi-Query technique tạo ra các biến thể query như thế nào để tăng recall?
- Step-Back Prompting khác gì so với Query Decomposition và khi nào dùng mỗi kỹ thuật?
- Làm thế nào để tự động chọn strategy phù hợp dựa trên đặc điểm của query (auto-routing)?
🎉 Tuyệt vời! Bạn đã hoàn thành bài học Query Enhancement!
Tiếp theo: Hãy cùng tìm hiểu về Hybrid Search & Reranking trong bài tiếp theo!
🚀 Bài tiếp theo
Hybrid Search & Reranking — Kết hợp keyword + semantic search và reranking pipeline!
