What is Retrieval-Augmented Generation?

Retrieval-Augmented Generation (RAG) is a prompting technique that enhances large language model (LLM) responses by dynamically retrieving relevant information from external knowledge sources before generating a response. Rather than relying solely on the model’s internal knowledge, RAG incorporates up-to-date, specific, and contextually relevant information from external databases, documents, or knowledge bases.

Why Use Retrieval-Augmented Generation?

  • Factual Accuracy: Access to external knowledge reduces hallucinations and factual errors
  • Up-to-Date Information: Retrieves current information beyond the model’s training data
  • Domain Specialization: Can access domain-specific knowledge not well-represented in general LLM training
  • Knowledge Grounding: Provides citations and sources for statements to increase trustworthiness
  • Scalable Knowledge: Can access vast amounts of knowledge without fine-tuning the base model
  • Customizable Responses: Tailor responses based on your specific knowledge repositories

Basic Implementation in Latitude

Here’s a simple RAG implementation using Latitude:

RAG Basic Example
---
provider: OpenAI
model: gpt-4o
temperature: 0.7
tools:
  - search_knowledge_base:
      description: Retrieve information from the knowledge base
      parameters:
        type: object
        additionalProperties: false
        required: ['query']
        properties:
          query:
            type: string
            description: The search query to retrieve relevant information
---

# Knowledge-Enhanced Response

Answer the user's question about {{ topic }} by first retrieving relevant information.

## User Question:
{{ user_question }}

## Information Retrieval:
I'll search for the most relevant information to answer this question.

## Comprehensive Answer:
Based on the retrieved information, I'll now provide a complete and accurate answer to the question.

Advanced Implementation with Multiple Sources

This example shows a more sophisticated RAG implementation that retrieves information from multiple sources and evaluates their relevance:

---
provider: OpenAI
model: gpt-4o
temperature: 0.3
tools:
  - search_knowledge_base:
      description: Retrieve information from the primary knowledge base
      parameters:
        type: object
        additionalProperties: false
        required: ['query']
        properties:
          query:
            type: string
            description: The search query to retrieve relevant information
  - search_recent_documents:
      description: Retrieve information from recent documents for time-sensitive information
      parameters:
        type: object
        additionalProperties: false
        required: ['query']
        properties:
          query:
            type: string
            description: The search query to retrieve recent information
---

<step>
# Initial Information Gathering

## User Question:
{{ user_question }}

## Primary Knowledge Search:
Let me search our main knowledge base for relevant information.

## Recent Documents Search:
I'll also check recent documents for any updates or new information.
</step>

<step>
# Information Synthesis

## Retrieved Information:
Let me analyze and synthesize the information from both sources:

1. **Main Knowledge Base Findings:**
   - Key facts and concepts retrieved
   - Relevant background information

2. **Recent Documents Findings:**
   - Updates or new information
   - Changes to previously established information

## Information Evaluation:
- Relevance score of each retrieved piece
- Consistency between sources
- Recency and reliability assessment
</step>

<step>
# Final Response Generation

Based on the synthesized information, I'll now provide a comprehensive answer:

## Answer:
[Comprehensive answer to the user's question]

## Sources:
[List of sources used with citations]

## Confidence Level:
[Assessment of confidence based on quality and consistency of retrieved information]
</step>

Domain-Specific RAG Implementation

This example shows how to implement RAG for a specific domain (medical information):

---
provider: OpenAI
model: gpt-4o
temperature: 0.2
tools:
  - search_medical_database:
      description: Retrieve information from verified medical databases
      parameters:
        type: object
        additionalProperties: false
        required: ['query', 'search_depth']
        properties:
          query:
            type: string
            description: The medical search query
          search_depth:
            type: string
            enum: ['basic', 'detailed', 'comprehensive']
            description: The depth of search to perform
---

# Medical Information Assistant

I'll answer your medical question by retrieving information from verified medical databases.

## Medical Question:
{{ medical_question }}

## Information Retrieval:
Searching medical databases for clinically validated information...

## Medical Answer:
Based on information from verified medical sources:

[Detailed answer with medical references]

## Important Notice:
This information is for educational purposes only and not a substitute for professional medical advice. Always consult a healthcare provider for medical concerns.

Best Practices for RAG

To implement retrieval-augmented generation effectively:

  1. Optimize Search Queries

    • Extract key entities and concepts from user questions
    • Use query expansion to find related information
    • Implement query reformulation techniques
  2. Vector Database Setup

    • Choose appropriate embedding models for your content
    • Implement chunking strategies based on content type
    • Use metadata filtering to improve retrieval precision
  3. Result Processing

    • Rank results by relevance and recency
    • Filter out irrelevant or low-quality retrievals
    • Rerank results based on semantic similarity
  4. Source Integration

    • Include source attribution in responses
    • Assess source credibility and prioritize reliable sources
    • Handle conflicting information from multiple sources
  5. Information Synthesis

    • Combine information from multiple sources coherently
    • Identify and resolve contradictions
    • Maintain the context and maintain factual consistency

Integrating RAG with the Latitude SDK

Here’s how to implement RAG using the Latitude SDK with external knowledge sources:

import { Pinecone } from '@pinecone-database/pinecone'
import OpenAI from 'openai'
import { Latitude } from '@latitude-data/sdk'

type Tools = { search_knowledge_base: { query: string } }

const PINECONE_INDEX_NAME = 'your-knowledge-index'

async function run() {
  const sdk = new Latitude(process.env.LATITUDE_API_KEY, {
    projectId: Number(process.env.PROJECT_ID),
    versionUuid: 'live',
  })
  const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
  const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY })
  const pc = pinecone.Index(PINECONE_INDEX_NAME)

  const userQuestion = "What's the latest research on quantum computing applications?"

  const result = await sdk.prompts.run<Tools>('rag/quantum-research', {
    parameters: { user_question: userQuestion },
    tools: {
      search_knowledge_base: async ({ query }) => {
        // Create embedding from query
        const embedding = await openai.embeddings
          .create({
            input: query,
            model: 'text-embedding-3-small',
          })
          .then((res) => res.data[0].embedding)

        // Search vector database
        const searchResults = await pc.query({
          vector: embedding,
          topK: 5,
          includeMetadata: true,
        })

        // Format and return results
        return searchResults.matches
          .map((match) => match.metadata?.content)
          .join('\n\n')
      },
    },
  })

  console.log('AI Response:', result.response.text)
}

Advanced RAG Techniques

Recursive Retrieval

Implement multi-hop retrieval for complex questions:

Recursive RAG
---
provider: OpenAI
model: gpt-4o
temperature: 0.5
tools:
  - search_knowledge_base:
      description: Retrieve information from the knowledge base
      parameters:
        type: object
        additionalProperties: false
        required: ['query']
        properties:
          query:
            type: string
            description: The search query
---

<step>
# Question Decomposition

## Original Question:
{{ complex_question }}

## Sub-questions:
Let me break this down into smaller, answerable sub-questions:
1. [First sub-question]
2. [Second sub-question]
3. [Third sub-question]
</step>

<step>
# Progressive Retrieval

I'll search for information to answer each sub-question:

## First Sub-question Search:
[Search and retrieve information]

## Second Sub-question Search:
[Search and retrieve information]

## Third Sub-question Search:
[Search and retrieve information]
</step>

<step>
# Integrated Response

Based on all the information retrieved for each sub-question:

## Comprehensive Answer:
[Integrated answer that addresses the original complex question]
</step>

Hybrid Retrieval

Combine different retrieval methods for better results:

---
provider: OpenAI
model: gpt-4o
temperature: 0.3
tools:
  - keyword_search:
      description: Traditional keyword-based search
      parameters:
        type: object
        additionalProperties: false
        required: ['query']
        properties:
          query:
            type: string
  - semantic_search:
      description: Vector-based semantic search
      parameters:
        type: object
        additionalProperties: false
        required: ['query']
        properties:
          query:
            type: string
---

# Hybrid Retrieval System

## User Query:
{{ user_query }}

## Keyword Search:
I'll perform a traditional keyword search for exact matches.

## Semantic Search:
I'll also conduct a semantic search to find conceptually related information.

## Combined Results:
Analyzing and combining results from both search methods to provide the most comprehensive answer:

[Comprehensive answer combining both search approaches]

Retrieval-Augmented Generation works well when combined with other prompting techniques:

  1. Chain-of-Thought with RAG: Combine retrieved information with step-by-step reasoning for complex problem-solving.

  2. Self-Consistency and RAG: Generate multiple RAG-enhanced responses and select the most consistent one.

  3. Few-Shot Learning with RAG: Augment few-shot examples with retrieved information to improve performance on specialized tasks.

  4. Constitutional AI with RAG: Use retrieved guidelines or policies to ensure AI responses comply with specific rules.

  5. Template-Based Prompting with RAG: Use retrieved information to fill in template slots for more accurate and contextual responses.

Real-World Applications

RAG is particularly valuable in these domains:

  • Enterprise Knowledge Management: Access to internal documents, policies, and knowledge bases
  • Legal Research: Retrieving relevant case law, statutes, and legal opinions
  • Medical Information Systems: Accessing up-to-date medical research and clinical guidelines
  • Customer Support: Retrieving product information and troubleshooting guides
  • Educational Platforms: Providing accurate and source-backed answers to student questions
  • Financial Analysis: Accessing market data and financial reports for informed analysis