Just simple AI-business // LLM inference/FT

Wrap open-weight LLMs into API and sale, add some effort to fix LLMs problems

sbagency
5 min read1 day ago

Open weights LLMs make this business, any startup can build alternative services to closed corporate ones. But startups will face common problems, like hallucinations, inaccuracy, etc. Such startups should invent/create solutions to fix that.

https://www.lamini.ai/blog/lamini-memory-tuning

Lamini Memory Tuning is a new method for fine-tuning large language models (LLMs) that achieves high factual accuracy while minimizing the occurrence of hallucinations (the generation of incorrect information). This technique overcomes a long-standing paradox in AI research, where models typically trade off between precise factual accuracy and generalization capabilities.

Lamini Memory Tuning achieves this by:

1. Fine-tuning millions of expert adapters (LoRAs) to remember specific facts, with the goal of achieving zero error on these facts.
2. Preserving the model’s generalization capabilities using a Mixture of Experts (MoE) architecture.
3. Selecting relevant experts from an index at inference time, reducing latency and computational costs.

The results are impressive:

* 95% factual accuracy on critical use cases, where previous approaches struggled to achieve 50% accuracy.
* Dramatic reductions in hallucinations (from 50% to 5% in one case study).

Case studies have demonstrated the effectiveness of Lamini Memory Tuning in various applications, including text-to-SQL, classification, and recommendations.

Lamini Memory Tuning has significant implications for the AI community, enabling full automation, reducing costs, and improving user experiences. However, this technique is still relatively new, and the authors are working with select partners to explore its full potential.

Key benefits of Lamini Memory Tuning include:

* High factual accuracy
* Low latency
* Low costs
* Faster development cycles
* Seamless user experiences

To try Lamini Memory Tuning, interested parties can contact the authors or read the research paper for more details.

Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations — Lamini-1 — that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically.

https://www.lamini.ai/

Sharon, the co-founder and CEO of Lamini, discusses the current state of generative AI and how her company is working to improve the technology. She points out that current language models like Llama 3.2 are magical but have limitations, such as hallucinating facts and lacking expertise. To address this, Lamini has developed a technique called memory tuning, which combines different technologies to allow the model to learn from specific data and improve its accuracy.

Sharon shares a few use cases where memory tuning has led to significant improvements in accuracy, including a business intelligence agent that achieved 95% accuracy and a customer service agent that achieved 99.9% accuracy. She also demos how Lamini’s platform can be used to fine-tune language models and generate more accurate outputs, such as SQL queries.

Throughout the talk, Sharon emphasizes the importance of open-source models and the need for further innovation in the field. She highlights the potential of memory tuning to enable more accurate and reliable language models, and how this can lead to deeper industry-specific use cases.

Some key points from the talk include:

1. Current language models are limited by their ability to hallucinate facts and lack expertise.
2. Memory tuning is a technique that combines different technologies to allow language models to learn from specific data and improve their accuracy.
3. Lamini’s platform uses memory tuning to fine-tune language models and generate more accurate outputs, such as SQL queries.
4. The company has achieved significant improvements in accuracy in various use cases, including business intelligence and customer service.
5. Open-source models are essential for further innovation in the field.

Overall, Sharon’s talk highlights the potential of memory tuning to improve the accuracy and reliability of language models, and how this can lead to more practical applications in various industries.

Getting started with Lamini API

!pip install lamini
from google.colab import userdata
LAMINI_API_KEY=userdata.get('LAMINI_API_KEY')
import lamini
lamini.api_key = LAMINI_API_KEY

#llm = lamini.Lamini("meta-llama/Meta-Llama-3.1-8B-Instruct")
llm = lamini.Lamini("meta-llama/Llama-3.2-3B-Instruct")

"""
APIError: API error {'detail': "Currently this user has support for base models:
['EleutherAI/pythia-410m', 'EleutherAI/pythia-70m', 'hf-internal-testing/tiny-random-gpt2',
'meta-llama/Llama-2-13b-chat-hf', 'meta-llama/Llama-2-7b-chat-hf', 'meta-llama/Llama-2-7b-hf',
'meta-llama/Meta-Llama-3-8B-Instruct', 'meta-llama/Meta-Llama-3.1-8B-Instruct', 'microsoft/phi-2',
'microsoft/Phi-3-mini-4k-instruct', 'mistralai/Mistral-7B-Instruct-v0.1',
'mistralai/Mistral-7B-Instruct-v0.2', 'mistralai/Mistral-7B-Instruct-v0.3',
'Qwen/Qwen2-7B-Instruct', 'meta-llama/Llama-3.2-3B-Instruct',
'meta-llama/Llama-3.2-1B-Instruct']. Need help? Email us at info@lamini.ai"}
"""
PROMPT="""<|begin_of_text|><|start_header_id|>user<|end_header_id|>

{message}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

"""

print(llm.generate(PROMPT.format(message="hey, tell me a joke about horse")))

--

--

sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.