Lý thuyết
35 phút
Bài 4/5

Advanced RAG Techniques

Các kỹ thuật nâng cao để cải thiện chất lượng RAG

🚀 Advanced RAG Techniques

Basic RAG có thể không đủ cho production. Bài này cover các kỹ thuật advanced để improve retrieval và generation quality.

RAG Quality Issues

Common Problems
  1. Irrelevant retrieval: Retrieve sai documents
  2. Missing context: Không retrieve đủ info
  3. Lost in the middle: LLM bỏ qua info ở giữa context
  4. Hallucination: Generate info không có trong documents

Query Enhancement Techniques

1. HyDE (Hypothetical Document Embeddings)

Generate hypothetical answer, embed đó thay vì query:

Diagram
graph LR
    Q[Query] --> L[LLM]
    L --> H[Hypothetical Answer]
    H --> E[Embed]
    E --> S[Search]
    S --> R[Real Documents]
Python
1from langchain.chains import HypotheticalDocumentEmbedder
2from langchain_openai import ChatOpenAI, OpenAIEmbeddings
3
4llm = ChatOpenAI(model="gpt-4o")
5base_embeddings = OpenAIEmbeddings()
6
7hyde_embeddings = HypotheticalDocumentEmbedder.from_llm(
8 llm=llm,
9 base_embeddings=base_embeddings,
10 prompt_key="web_search" # or custom prompt
11)
12
13# Embed query bằng hypothetical answer
14query = "What is the capital of France?"
15# LLM generates: "The capital of France is Paris, a major European city..."
16# Embed câu trả lời giả định này để search
17result = vectorstore.similarity_search(query, embedding=hyde_embeddings)

2. Multi-Query Retrieval

Generate multiple variations của query:

Python
1from langchain.retrievers import MultiQueryRetriever
2from langchain_openai import ChatOpenAI
3
4llm = ChatOpenAI(model="gpt-4o")
5
6retriever = MultiQueryRetriever.from_llm(
7 retriever=vectorstore.as_retriever(),
8 llm=llm
9)
10
11# Original: "What is machine learning?"
12# Generated:
13# - "How does machine learning work?"
14# - "What are the basics of ML?"
15# - "Define machine learning concept"
16
17docs = retriever.get_relevant_documents("What is machine learning?")

3. Step-Back Prompting

Abstract question để retrieve broader context:

Python
1from langchain.prompts import ChatPromptTemplate
2
3step_back_prompt = ChatPromptTemplate.from_template("""
4You are an expert at understanding questions and abstracting them.
5Given the question, generate a more general question that would help
6retrieve relevant background information.
7
8Original question: {question}
9Step-back question:
10""")
11
12# Example:
13# Original: "Why did Apple stock drop on Jan 15, 2024?"
14# Step-back: "What factors affect Apple's stock price?"

4. Query Decomposition

Chia complex query thành sub-queries:

Python
1decomposition_prompt = ChatPromptTemplate.from_template("""
2Break down this complex question into simpler sub-questions:
3
4Question: {question}
5
6Sub-questions:
7""")
8
9# Example:
10# Original: "Compare revenue growth of Apple and Microsoft in 2023"
11# Sub-questions:
12# 1. What was Apple's revenue growth in 2023?
13# 2. What was Microsoft's revenue growth in 2023?
14# 3. How do they compare?

Retrieval Enhancement

1. Hybrid Search

Kết hợp keyword search + semantic search:

Python
1from langchain.retrievers import EnsembleRetriever
2from langchain_community.retrievers import BM25Retriever
3
4# Keyword search (BM25)
5bm25_retriever = BM25Retriever.from_documents(documents)
6bm25_retriever.k = 5
7
8# Semantic search
9semantic_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
10
11# Combine with weights
12ensemble_retriever = EnsembleRetriever(
13 retrievers=[bm25_retriever, semantic_retriever],
14 weights=[0.3, 0.7] # 30% keyword, 70% semantic
15)
16
17docs = ensemble_retriever.get_relevant_documents(query)

2. Reranking

Rerank results với cross-encoder:

Python
1from langchain.retrievers import ContextualCompressionRetriever
2from langchain.retrievers.document_compressors import CohereRerank
3
4# Cohere Reranker
5compressor = CohereRerank(
6 model="rerank-english-v3.0",
7 top_n=5
8)
9
10compression_retriever = ContextualCompressionRetriever(
11 base_compressor=compressor,
12 base_retriever=vectorstore.as_retriever(search_kwargs={"k": 20})
13)
14
15# Retrieve 20, rerank, return top 5
16docs = compression_retriever.get_relevant_documents(query)

3. Contextual Compression

Loại bỏ irrelevant content từ chunks:

Python
1from langchain.retrievers.document_compressors import LLMChainExtractor
2
3compressor = LLMChainExtractor.from_llm(llm)
4
5compression_retriever = ContextualCompressionRetriever(
6 base_compressor=compressor,
7 base_retriever=base_retriever
8)
9
10# LLM extract chỉ relevant parts từ mỗi chunk
11compressed_docs = compression_retriever.get_relevant_documents(query)

4. Parent Document Retriever

Retrieve small chunks, return large parents:

Python
1from langchain.retrievers import ParentDocumentRetriever
2from langchain.storage import InMemoryStore
3
4# Small chunks for retrieval
5child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
6
7# Large chunks for context
8parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)
9
10store = InMemoryStore()
11
12retriever = ParentDocumentRetriever(
13 vectorstore=vectorstore,
14 docstore=store,
15 child_splitter=child_splitter,
16 parent_splitter=parent_splitter
17)
18
19# Add documents
20retriever.add_documents(documents)
21
22# Search: match small chunks, return large parents
23docs = retriever.get_relevant_documents(query)

Generation Enhancement

1. Long-Context Reordering

Reorder documents để important ones không bị lost:

Python
1from langchain.document_transformers import LongContextReorder
2
3reordering = LongContextReorder()
4
5# Most relevant → first and last positions
6reordered_docs = reordering.transform_documents(docs)

2. Self-RAG

LLM tự đánh giá cần retrieve không:

Python
1self_rag_prompt = ChatPromptTemplate.from_template("""
2Question: {question}
3
4Decide if you need to retrieve information to answer this question.
5If yes, respond with [RETRIEVE]. If you can answer directly, respond with [ANSWER].
6
7Decision:
8""")
9
10# Based on decision:
11# - [RETRIEVE]: Do retrieval
12# - [ANSWER]: Answer directly without RAG

3. Citation & Verification

Thêm citations và verify answers:

Python
1citation_prompt = ChatPromptTemplate.from_template("""
2Answer the question based on the context. Include [1], [2], etc. citations.
3
4Context:
5[1] {doc1}
6[2] {doc2}
7[3] {doc3}
8
9Question: {question}
10
11Answer with citations:
12""")
13
14verification_prompt = ChatPromptTemplate.from_template("""
15Verify if this answer is fully supported by the context.
16If not, identify what parts are unsupported.
17
18Context: {context}
19Answer: {answer}
20
21Verification:
22""")

Complete Advanced RAG Pipeline

Python
1from langchain_openai import ChatOpenAI, OpenAIEmbeddings
2from langchain_community.vectorstores import Chroma
3from langchain.retrievers import ContextualCompressionRetriever, EnsembleRetriever
4from langchain_community.retrievers import BM25Retriever
5from langchain.retrievers.document_compressors import CohereRerank
6from langchain.prompts import ChatPromptTemplate
7from langchain.schema.runnable import RunnablePassthrough
8
9# Initialize
10llm = ChatOpenAI(model="gpt-4o")
11embeddings = OpenAIEmbeddings()
12
13# 1. Hybrid Retrieval
14bm25 = BM25Retriever.from_documents(documents, k=10)
15semantic = vectorstore.as_retriever(search_kwargs={"k": 10})
16hybrid = EnsembleRetriever(retrievers=[bm25, semantic], weights=[0.3, 0.7])
17
18# 2. Reranking
19reranker = CohereRerank(model="rerank-english-v3.0", top_n=5)
20retriever = ContextualCompressionRetriever(
21 base_compressor=reranker,
22 base_retriever=hybrid
23)
24
25# 3. Generation with citations
26prompt = ChatPromptTemplate.from_template("""
27Answer the question based on the context. Include citations [1], [2], etc.
28
29Context:
30{context}
31
32Question: {question}
33
34Answer:
35""")
36
37def format_docs_with_citations(docs):
38 formatted = []
39 for i, doc in enumerate(docs, 1):
40 formatted.append(f"[{i}] {doc.page_content}")
41 return "\n\n".join(formatted)
42
43# Chain
44chain = (
45 {"context": retriever | format_docs_with_citations, "question": RunnablePassthrough()}
46 | prompt
47 | llm
48)
49
50# Run
51response = chain.invoke("What is the company's revenue growth strategy?")

Evaluation với RAGAS

Python
1from ragas import evaluate
2from ragas.metrics import (
3 faithfulness,
4 answer_relevancy,
5 context_relevancy,
6 context_recall
7)
8
9# Prepare test data
10test_data = {
11 "question": ["What is X?", "How does Y work?"],
12 "answer": ["X is...", "Y works by..."],
13 "contexts": [["Context for Q1"], ["Context for Q2"]],
14 "ground_truth": ["X is actually...", "Y works by..."]
15}
16
17# Evaluate
18result = evaluate(
19 test_data,
20 metrics=[
21 faithfulness, # Answer dựa trên context không?
22 answer_relevancy, # Answer relevant với question không?
23 context_relevancy, # Context relevant với question không?
24 context_recall # Context chứa ground truth không?
25 ]
26)
27
28print(result)

Bài tập thực hành

Hands-on Exercise

Implement Advanced RAG:

  1. Set up hybrid retrieval (BM25 + Semantic)
  2. Add reranking với Cohere
  3. Test với 10 sample queries
  4. Compare metrics với basic RAG
  5. Implement citations
Python
1# TODO: Your implementation

Tiếp theo

Trong bài tiếp theo, chúng ta sẽ học cách Deploy RAG to Production với considerations về scale và monitoring.


Tài liệu tham khảo