Custom LLMs parade // fine-tuned, RAG-specialized, hacked

3 min readNov 17, 2023

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

https://www.marktechpost.com/2023/11/17/llmware-launches-rag-specialized-7b-parameter-llms-production-grade-fine-tuned-models-for-enterprise-workflows-involving-complex-business-documents/

As more enterprises look to deploy scalable RAG systems using their own private information, there is a growing recognition of several needs:
Unified framework that integrates LLM models with a set of surrounding workflow capabilities (e.g., document parsing, embedding, prompt management, source verification, audit tracking);
High-quality, smaller, specialized LLMs that have been optimized for fact-based question-answering and enterprise workflows and
Open Source, Cost-effective, Private deployment with flexibility and options for customization.

The DRAGON model family joins two other LLMWare RAG model collections: BLING and Industry-BERT. The BLING models are no-GPU required RAG-specialized smaller LLM models (1B — 3B) that can run on a developer’s laptop. Since the training methodology is very similar, the intent is that a developer can start with a local BLING model, running on their laptop, and then seamlessly drop-in a DRAGON model for higher performance in production. DRAGON models have all been designed for private deployment on a single enterprise-grade GPU server, so that enterprises can deploy an end-to-end RAG system, securely and privately in their own security zone.

https://blog.perplexity.ai/blog/turbocharging-llama-2-70b-with-nvidia-h100

https://www.linkedin.com/posts/aravind-srinivas-16051987_turbocharging-llama-2-70b-with-nvidia-h100-activity-7131343111749828608-xKDH

Llama on a micro-controller // hack

Custom LLMs parade // fine-tuned, RAG-specialized, hacked

Llama on a micro-controller // hack

Written by sbagency

No responses yet