🚀 Advanced RAG Techniques
Basic RAG có thể không đủ cho production. Bài này cover các kỹ thuật advanced để improve retrieval và generation quality.
RAG Quality Issues
- Irrelevant retrieval: Retrieve sai documents
- Missing context: Không retrieve đủ info
- Lost in the middle: LLM bỏ qua info ở giữa context
- Hallucination: Generate info không có trong documents
Query Enhancement Techniques
1. HyDE (Hypothetical Document Embeddings)
Generate hypothetical answer, embed đó thay vì query:
graph LR
Q[Query] --> L[LLM]
L --> H[Hypothetical Answer]
H --> E[Embed]
E --> S[Search]
S --> R[Real Documents]1from langchain.chains import HypotheticalDocumentEmbedder2from langchain_openai import ChatOpenAI, OpenAIEmbeddings34llm = ChatOpenAI(model="gpt-4o")5base_embeddings = OpenAIEmbeddings()67hyde_embeddings = HypotheticalDocumentEmbedder.from_llm(8 llm=llm,9 base_embeddings=base_embeddings,10 prompt_key="web_search" # or custom prompt11)1213# Embed query bằng hypothetical answer14query = "What is the capital of France?"15# LLM generates: "The capital of France is Paris, a major European city..."16# Embed câu trả lời giả định này để search17result = vectorstore.similarity_search(query, embedding=hyde_embeddings)2. Multi-Query Retrieval
Generate multiple variations của query:
1from langchain.retrievers import MultiQueryRetriever2from langchain_openai import ChatOpenAI34llm = ChatOpenAI(model="gpt-4o")56retriever = MultiQueryRetriever.from_llm(7 retriever=vectorstore.as_retriever(),8 llm=llm9)1011# Original: "What is machine learning?"12# Generated:13# - "How does machine learning work?"14# - "What are the basics of ML?"15# - "Define machine learning concept"1617docs = retriever.get_relevant_documents("What is machine learning?")3. Step-Back Prompting
Abstract question để retrieve broader context:
1from langchain.prompts import ChatPromptTemplate23step_back_prompt = ChatPromptTemplate.from_template("""4You are an expert at understanding questions and abstracting them.5Given the question, generate a more general question that would help6retrieve relevant background information.78Original question: {question}9Step-back question:10""")1112# Example:13# Original: "Why did Apple stock drop on Jan 15, 2024?"14# Step-back: "What factors affect Apple's stock price?"4. Query Decomposition
Chia complex query thành sub-queries:
1decomposition_prompt = ChatPromptTemplate.from_template("""2Break down this complex question into simpler sub-questions:34Question: {question}56Sub-questions:7""")89# Example:10# Original: "Compare revenue growth of Apple and Microsoft in 2023"11# Sub-questions:12# 1. What was Apple's revenue growth in 2023?13# 2. What was Microsoft's revenue growth in 2023?14# 3. How do they compare?Retrieval Enhancement
1. Hybrid Search
Kết hợp keyword search + semantic search:
1from langchain.retrievers import EnsembleRetriever2from langchain_community.retrievers import BM25Retriever34# Keyword search (BM25)5bm25_retriever = BM25Retriever.from_documents(documents)6bm25_retriever.k = 578# Semantic search9semantic_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})1011# Combine with weights12ensemble_retriever = EnsembleRetriever(13 retrievers=[bm25_retriever, semantic_retriever],14 weights=[0.3, 0.7] # 30% keyword, 70% semantic15)1617docs = ensemble_retriever.get_relevant_documents(query)2. Reranking
Rerank results với cross-encoder:
1from langchain.retrievers import ContextualCompressionRetriever2from langchain.retrievers.document_compressors import CohereRerank34# Cohere Reranker5compressor = CohereRerank(6 model="rerank-english-v3.0",7 top_n=58)910compression_retriever = ContextualCompressionRetriever(11 base_compressor=compressor,12 base_retriever=vectorstore.as_retriever(search_kwargs={"k": 20})13)1415# Retrieve 20, rerank, return top 516docs = compression_retriever.get_relevant_documents(query)3. Contextual Compression
Loại bỏ irrelevant content từ chunks:
1from langchain.retrievers.document_compressors import LLMChainExtractor23compressor = LLMChainExtractor.from_llm(llm)45compression_retriever = ContextualCompressionRetriever(6 base_compressor=compressor,7 base_retriever=base_retriever8)910# LLM extract chỉ relevant parts từ mỗi chunk11compressed_docs = compression_retriever.get_relevant_documents(query)4. Parent Document Retriever
Retrieve small chunks, return large parents:
1from langchain.retrievers import ParentDocumentRetriever2from langchain.storage import InMemoryStore34# Small chunks for retrieval5child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)67# Large chunks for context8parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)910store = InMemoryStore()1112retriever = ParentDocumentRetriever(13 vectorstore=vectorstore,14 docstore=store,15 child_splitter=child_splitter,16 parent_splitter=parent_splitter17)1819# Add documents20retriever.add_documents(documents)2122# Search: match small chunks, return large parents23docs = retriever.get_relevant_documents(query)Generation Enhancement
1. Long-Context Reordering
Reorder documents để important ones không bị lost:
1from langchain.document_transformers import LongContextReorder23reordering = LongContextReorder()45# Most relevant → first and last positions6reordered_docs = reordering.transform_documents(docs)2. Self-RAG
LLM tự đánh giá cần retrieve không:
1self_rag_prompt = ChatPromptTemplate.from_template("""2Question: {question}34Decide if you need to retrieve information to answer this question.5If yes, respond with [RETRIEVE]. If you can answer directly, respond with [ANSWER].67Decision:8""")910# Based on decision:11# - [RETRIEVE]: Do retrieval12# - [ANSWER]: Answer directly without RAG3. Citation & Verification
Thêm citations và verify answers:
1citation_prompt = ChatPromptTemplate.from_template("""2Answer the question based on the context. Include [1], [2], etc. citations.34Context:5[1] {doc1}6[2] {doc2}7[3] {doc3}89Question: {question}1011Answer with citations:12""")1314verification_prompt = ChatPromptTemplate.from_template("""15Verify if this answer is fully supported by the context.16If not, identify what parts are unsupported.1718Context: {context}19Answer: {answer}2021Verification:22""")Complete Advanced RAG Pipeline
1from langchain_openai import ChatOpenAI, OpenAIEmbeddings2from langchain_community.vectorstores import Chroma3from langchain.retrievers import ContextualCompressionRetriever, EnsembleRetriever4from langchain_community.retrievers import BM25Retriever5from langchain.retrievers.document_compressors import CohereRerank6from langchain.prompts import ChatPromptTemplate7from langchain.schema.runnable import RunnablePassthrough89# Initialize10llm = ChatOpenAI(model="gpt-4o")11embeddings = OpenAIEmbeddings()1213# 1. Hybrid Retrieval14bm25 = BM25Retriever.from_documents(documents, k=10)15semantic = vectorstore.as_retriever(search_kwargs={"k": 10})16hybrid = EnsembleRetriever(retrievers=[bm25, semantic], weights=[0.3, 0.7])1718# 2. Reranking19reranker = CohereRerank(model="rerank-english-v3.0", top_n=5)20retriever = ContextualCompressionRetriever(21 base_compressor=reranker,22 base_retriever=hybrid23)2425# 3. Generation with citations26prompt = ChatPromptTemplate.from_template("""27Answer the question based on the context. Include citations [1], [2], etc.2829Context:30{context}3132Question: {question}3334Answer:35""")3637def format_docs_with_citations(docs):38 formatted = []39 for i, doc in enumerate(docs, 1):40 formatted.append(f"[{i}] {doc.page_content}")41 return "\n\n".join(formatted)4243# Chain44chain = (45 {"context": retriever | format_docs_with_citations, "question": RunnablePassthrough()}46 | prompt47 | llm48)4950# Run51response = chain.invoke("What is the company's revenue growth strategy?")Evaluation với RAGAS
1from ragas import evaluate2from ragas.metrics import (3 faithfulness,4 answer_relevancy,5 context_relevancy,6 context_recall7)89# Prepare test data10test_data = {11 "question": ["What is X?", "How does Y work?"],12 "answer": ["X is...", "Y works by..."],13 "contexts": [["Context for Q1"], ["Context for Q2"]],14 "ground_truth": ["X is actually...", "Y works by..."]15}1617# Evaluate18result = evaluate(19 test_data,20 metrics=[21 faithfulness, # Answer dựa trên context không?22 answer_relevancy, # Answer relevant với question không?23 context_relevancy, # Context relevant với question không?24 context_recall # Context chứa ground truth không?25 ]26)2728print(result)Bài tập thực hành
Implement Advanced RAG:
- Set up hybrid retrieval (BM25 + Semantic)
- Add reranking với Cohere
- Test với 10 sample queries
- Compare metrics với basic RAG
- Implement citations
1# TODO: Your implementationTiếp theo
Trong bài tiếp theo, chúng ta sẽ học cách Deploy RAG to Production với considerations về scale và monitoring.
