Multi-Source Retrieval Augmented Generation
Knowledge base
Retrieval-Augmented Generation (RAG) is a powerful technique that enhances Large Language Models (LLMs) by grounding their responses in factual, relevant information. Instead of relying solely on the model’s pre-trained knowledge, RAG retrieves specific information from external sources and uses it as context for generating responses.
Deployment models
Extravaganza knowledge base in Cloud
To start the journey with provided solution please register or private or company account for cloud services, accept terms and conditions of services usage and freely use the provided solutions compliant with Extravaganza knowledge base e.g chatbot, or recorder.
To take the survey one step ahead and respond to the need for flexibility in the delivered solution, we provided frameworks for LLM and data processing compatible with Extravaganza knowledge base service, OpenAI API, and Llama-Index.
Extravaganza knowledge base Cloud On-Prem
Cloud On-Prem brings powerful knowledge base features directly to your infrastructure. Extravaganza Cloud On-Prem enables you to deploy the complete solution with your own environment to meet your company internal needs.
This solution is built for organizations that require enhanced data privacy, strict compliance adherence, and complete infrastructure control. Whether you’re managing critical infrastructure, operating under stringent regulations like GDPR or HIPAA, or simply need the flexibility to customize your environment, Extravaganza RAG Cloud On-Prem delivers enterprise-grade solution without compromising on security or control.
Before You Begin
Ensure you have the following ready before starting the installation:
Required:
- Kubernetes cluster configured
- Helm (version 3.12 with OCI Configuration)
- Kubectl with access to Kubernetes cluster
Optional:
- CPU architecture amd64 with AVX2/AVX512 instructions support is preferred
Minimal System Requirements:
| Component | Details |
|---|---|
| Kubernetes | Version 1.33 or newer |
| TLS certificate | Single certificate for all endpoints, or separate certificates for frontend, and API. Must be trusted by all connecting entities. |
| Knowledge base Cloud services | 4 machines with 4-8 CPU cores each, 8-16 GiB memory, GPU acceleration is not required but always welcome (with Nvidia CUDA 13.2 compatible hardware or Apple Metal) |
| NVMe/SSD storage | Storage for uploaded documents, 256 GiB for PVCs (cloud services are not ephemeral) |
| Third-party Services | 5 machines with 4 CPU cores each, 160 GiB NVMe/SSD storage each for PVCs |
Required Components
Third-Party Services
All these components must be installed prior to the knowledge base cloud services:
| Component | Version | Purpose | Notes |
|---|---|---|---|
| MariaDB | 12.x | Main metadata database | |
| Kafka broker | 4.x | Broker for internal inter-service communication | |
| Redis | 8.x | Main document storage and caching layer | |
| Milvus | 2.6.x | Main embedding vector store | |
| Neo4j | 5.26.x | Main nodes and edges graph store |
Step by Step Guide to Install
Extravaganza knowledge base Hybrid solution
To meet data storage security requirements, Extravaganza Business Services has developed a hybrid knowledge base implementation model. The data itself is processed in cloud but stored within customer’s environment.
This solution is built for organizations that require enhanced data privacy, strict compliance adherence, but not complete infrastructure control. Whether you’re managing critical infrastructure, operating under stringent regulations like GDPR or HIPAA, or simply need the flexibility to customize your environment, Extravaganza RAG Cloud Hybrid delivers enterprise-grade solution without compromising on security but relaxed the control.
Before You Begin
Ensure you have the following ready before starting the installation:
Minimal System Requirements:
| Component | Details |
|---|---|
| Third-party services | 3 machines with 4 CPU cores each, 160 GiB NVMe/SSD storage each. |
| Secure VPN connection | To secure communication between Customer infrastructure and Extravaganza Business Services Cloud Services. |
Required Components
Third-Party Services
All these components must be installed on customer environment:
| Component | Version | Purpose | Notes |
|---|---|---|---|
| Redis | 8.x | Main document storage | |
| Milvus | 2.6.x | Main embedding vector store | |
| Neo4j | 5.26.x | Main nodes and edges graph store |
Extravaganza Business Services knowledge base solution
Retrieval-Augumented Generation (RAG) integrates user queries with a collection of pertinent documents sourced from an external knowledge database, incorporating two esential elements: the Retrieval Component and the Generation Component.
The Retrieval Component is responsible for fetching relevant documents or information from the external knowledge database. It identifies and retrieves the most pertinent data based on the input query. After the retrieval process, the generation component takes the retrieved information and generates coherent, contextually relevant responses. It leverages the capabilities of the language model to produce meaningful outputs.
It is important to understand how the multi-source RAG with hybrid search and re-ranking algorithm works to know better where its accuracy come from. Provided solution represents a significant advancement over basic RAG systems.
- Multi-source retrieval pulls information from local static knowledge base, documents, media sources or other available information imported to and the web via Search API, giving access to both private documents stored and the latest information published online.
- Hybrid search combines three complementary search methods:
- Graph search helps traverse a graph dataset in the most efficient means possible. The right graph algorithm must be matched with the type of results you are looking for. Shines where the amount of data is huge,
- Dense semantic search captures conceptual similarity using vector embeddings, finding relevant content even when exact keywords don’t match. Works well where the amount of data is not too large,
- Sparse keywords search ensures important keywords are matched, similar to traditional search engines.
3. Re-ranking applies a specialized model to evaluate and sort the retrieved information based on its relevance to the specific question, ensuring the most useful context is prioritized.
This approach delivers more accurate, trustworthy responses (reducing the hallucinations of the LLM model) with several benefits:
- Better factual grounding through diverse information sources.
- Improved retrieval accuracy through complementary search methods.
- Enhanced relevance through intelligent re-ranking.
- Reduced hallucinations by providing high-quality context.
- Auditability through source citations.
Main functionalities
- Multiple modes support depending on customer needs for response speed and accuracy (local, global, hybrid, naive, mix, bypass).
- Multiple embedding dimension support.
- Configurable per agent.
- Powered by multiple data sources (web, documents in huge variety of supported formats, multimedia sources).
- Compatible with Large Language Models and Machine Learning Models.
- Manual or automatic natural language detection.
- Information protection against loss or theft.
- Provides a high level of data separation between agents (multitenancy, each tenant data and access are logically isolated from others, even though they share the same physical resources and underlying software instance).
- Ensures confidentiality of data transported between the client environment and the knowledge base.
- Offers various deployment models, allowing to choose the level of separation of stored data depending on the business specific needs and requirements.
Benefits
- Improved AI accuracy and reliability (ensures factual responses by accessing a company’s specific documents, provides precise and verifiable answers based on actual business information).
- Enhances user trust by the ability to trace answers back to their source documents, increases confidence in the AI system’s output.
- Lower costs by avoiding model retraining.
- Supports faster time-to-market (companies can deploy and scale AI systems more quickly without the long development cycles associated with model retraining).
- Improves operation efficiency (tasks like information retrieval, freeing up employees and enabling them to handle higher workloads).
- Enhanced scalability (can efficiently handle and maintain performance as data volumes increase, which is critical for businesses with large and growing datasets.
- Enhanced customer support (provide instant, personalized, and accurate answers, improving customer satisfaction and reducing the workload on support teams).
- Real-time knowledge updates (pull in the latest information instantly, keeping applications and internal tools up to date without constant model updates).
- Better compliance and auditability in relation to Large Language Models (make it easier to trace AI-generated answers back to their source documents, which is crucial for meeting audit requirements in regulated industries).
- More effective operations through faster customer support and improved decision-making (provide relevant, contextually accurate, and actionable insights from large datasets, helping to eliminate guesswork and reduce the risk of poor decisions).
- Drives innovation by analyzing customer feedback and market trends, RAG can inform product strategy and help businesses develop products that align with market needs.
- Multitenancy, key concept of cloud computing, particularly SaaS, offers benefits like lover costs, scalability, and efficient resource management.
