As more enterprises look to deploy scalable RAG systems using their own private information, there is a growing recognition of several needs:
Unified framework that integrates LLM models with a set of surrounding workflow capabilities (e.g., document parsing, embedding, prompt management, source verification, audit tracking);
High-quality, smaller, specialized LLMs that have been optimized for fact-based question-answering and enterprise workflows and
Open Source, Cost-effective, Private deployment with flexibility and options for customization.
The DRAGON model family joins two other LLMWare RAG model collections: BLING and Industry-BERT. The BLING models are no-GPU required RAG-specialized smaller LLM models (1B — 3B) that can run on a developer’s laptop. Since the training methodology is very similar, the intent is that a developer can start with a local BLING model, running on their laptop, and then seamlessly drop-in a DRAGON model for higher performance in production. DRAGON models have all been designed for private deployment on a single enterprise-grade GPU server, so that enterprises can deploy an end-to-end RAG system, securely and privately in their own security zone.