RAG System

Distributed Knowledge incorporates a Retrieval Augmented Generation (RAG) system that enhances LLM responses with relevant information from a knowledge base. This document explains how the RAG system works in the Distributed Knowledge architecture.

What is RAG?

Retrieval Augmented Generation (RAG) is a technique that combines:

Information Retrieval: Finding relevant documents or information from a knowledge base
Content Generation: Using an LLM to generate responses based on the retrieved information

This approach addresses limitations of traditional LLMs by providing:

Access to knowledge beyond the model's training data
More up-to-date information
Source-based responses that can be verified
Domain-specific expertise

RAG Implementation in Distributed Knowledge

The Distributed Knowledge RAG system is implemented in dk/core/rag.go and consists of the following components:

1. Vector Database

The system uses a vector database to store and retrieve embeddings:

Document Embeddings: Text is converted into numerical vectors
Semantic Search: Finds documents based on meaning rather than keywords
Efficient Retrieval: Quickly identifies the most relevant information
Persistent Storage: Maintains the knowledge base between sessions

2. Document Ingestion

The system processes documents from various sources:

Source Format: Documents are stored in JSONL format:

{"text": "Document content goes here", "file": "document_name.txt"}

Chunking: Long documents are divided into manageable segments
Metadata Preservation: Each chunk maintains its source information
Embedding Generation: Text chunks are converted to vector embeddings

3. Query Processing

When a query is received:

Query Embedding: The question is converted to a vector embedding
Similarity Search: The system finds chunks most similar to the query
Context Assembly: Relevant chunks are compiled into a context package
Prompt Construction: The query and context are formatted for the LLM

4. Response Generation

The system generates responses based on retrieved information:

Context Integration: The LLM receives both query and retrieved context
Citation Generation: Responses reference source documents
Confidence Assessment: The system evaluates response reliability
Format Standardization: Responses follow a consistent structure

Using the RAG System

Configuring RAG Sources

RAG sources are defined in a JSONL file with the following format:

{"text": "Einstein published the theory of relativity in 1905.", "file": "physics.txt"}
{"text": "Machine learning models use mathematical algorithms to improve through experience.", "file": "ai.txt"}

This file is specified using the -rag_sources parameter:

./dk -rag_sources=./data/knowledge_base.jsonl

Updating the Knowledge Base

New documents can be added to the RAG system using the updateKnowledgeSources MCP tool:

{
  "name": "updateKnowledgeSources",
  "parameters": {
    "file_name": "new_research.txt",
    "file_content": "The latest research shows significant progress in quantum computing..."
  }
}

Alternatively, you can specify a file path:

{
  "name": "updateKnowledgeSources",
  "parameters": {
    "file_path": "/path/to/document.pdf"
  }
}

Vector Database Configuration

The vector database location is configured with the -vector_db parameter:

./dk -vector_db=./data/vector_database

Federated RAG Capabilities

Distributed Knowledge extends traditional RAG with network awareness:

Distributed Knowledge Sources: Access information across multiple nodes
Expertise Routing: Direct queries to nodes with relevant knowledge
Collaborative Response Generation: Combine insights from multiple sources
Dynamic Knowledge Updates: Incorporate new information in real-time

Performance Considerations

Several factors affect RAG performance:

Vector Dimensions: Higher dimensions provide more precise matching but use more memory
Database Size: Larger knowledge bases require more resources but offer broader coverage
Chunk Size: Smaller chunks enable more precise retrieval but increase database size
Similarity Threshold: Higher thresholds improve relevance but may miss useful information

Best Practices

To get the most from the RAG system:

Organize Knowledge: Group related information in well-structured documents
Provide Context: Include sufficient background in each document
Update Regularly: Keep the knowledge base current with new information
Balance Coverage: Include both broad and specialized knowledge
Monitor Performance: Track which sources are frequently retrieved

Technical Implementation

The RAG system uses:

Vector Embeddings: Generated using the Ollama nomic-embed-text model
Vector Search: Employs cosine similarity for matching
Document Chunking: Uses overlapping segments to preserve context
Context Selection: Chooses top-k most relevant chunks
Memory Management: Efficiently handles large knowledge bases