What is the problem? // to be more competitive, productive, creative, etc.
SigmaboostGPT // private GPT/ChatGPT-like models/services
Open source SLMs/LLMs/MLMs training, fine-tuning, RAG/PEing for specific domains on private data in customers infrastructure
I want my own private ChatGPT-like service on my own data in my own infrastructure // customer request
Challenges (scientific & engineering):
- Make models smaller (SLM), faster (inference) and smarter (better quality)
- Increase context window // 2ooK now proprietary Claude 2.1, 128K GPT-4 Turbo, but 2–4K open models
- Hallucinations // training data quality, prompt engineering, evaluation, advanced reasoning (Chain-of-Thought, etc.)
How to train your ChatGPT: 10TB of text, 6K GPUs, 12 Days, obtain base model, fine-tune, etc.
Frustrated that your AI app breaks due to #OpenAI API downtime? Here are 12 embedding alternates:
👉 Llama-2 by Meta : self-host or API through Replicate (https://lnkd.in/gNpJpmSX)
Context length = 4k, 32k tokens
Model parameter size = 7B, 13B, 70B👉 Mistral-7B by Mistral AI : https://lnkd.in/gkM5cabd
Context length = 8k tokens
Model parameter size = 7B👉 Jina Embedding v2 by Jina AI : https://lnkd.in/g4mZ6URa
Context length = 8192 tokens
Model parameter size for v2 = 33m, 137m👉 Embed v3 by Cohere : https://lnkd.in/gsDzNsH6
recommended context length = 512, 1024
Model parameter size = 6B, 12B, 52B👉Voyage AI : https://lnkd.in/gnvpqGep
Context length = 4096 tokens👉 bge-large-en-v1.5 by BAAI: https://lnkd.in/gZR46Ea3
Context length = 512 tokens👉 Falcon by TII: https://lnkd.in/g9H_7T8r
Context length = 2k tokens
Model parameter size = 7B, 40B, 180B👉 MPT by MosiacML: https://lnkd.in/giVk9ms3
Context length = 4096, 8k, 65k tokens
Model parameter size = 7B, 30B👉 BLOOM by bigscience:
https://lnkd.in/giqarYTx
Context length = 2k tokens
Model parameter size = 560m, 1B, 3B, 7B, 176B👉 gte-large by Alibaba: https://lnkd.in/g92NYhFK
Context length = 512 tokens👉 Yi-34B by 01.AI : https://lnkd.in/gGTfTwEJ
Context length = 200,000 tokens
Model parameter size = 6B, 34B👉 Claude2 by Anthropic https://lnkd.in/g3ykVccx
Context length = 100,000 tokens
Model parameter size = 137B
𝗪𝗵𝗮𝘁 do you need to 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗲 an open-source 𝗟𝗟𝗠 to create your own 𝗳𝗶𝗻𝗮𝗻𝗰𝗶𝗮𝗹 𝗮𝗱𝘃𝗶𝘀𝗼𝗿?
This is the 𝗟𝗟𝗠 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴 𝗸𝗶𝘁 you must know ↓
𝗗𝗮𝘁𝗮𝘀𝗲𝘁
The key component of any successful ML project is the data.
You need a 100–1000 sample Q&A (questions & answers) dataset with financial scenarios.
The best approach is to hire a bunch of experts to create it manually.
But, for a PoC, that might get expensive & slow.
The good news is that a method called “𝘍𝘪𝘯𝘦𝘵𝘶𝘯𝘪𝘯𝘨 𝘸𝘪𝘵𝘩 𝘥𝘪𝘴𝘵𝘪𝘭𝘭𝘢𝘵𝘪𝘰𝘯” exists.
In a nutshell, this is how it works: “Use a big & powerful LLM (e.g., GPT4) to generate your fine-tuning data. After, use this data to fine-tune a smaller model (e.g., Falcon 7B).”
For specializing smaller LLMs on specific use cases (e.g., financial advisors), this is an excellent method to kick off your project.
𝗣𝗿𝗲-𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 𝗟𝗟𝗠
You never want to start training your LLM from scratch (or rarely).
Why? Because you need trillions of tokens & millions of $$$ in compute power.
You want to fine-tune your LLM on your specific task.
The good news is that you can find a plethora of open-source LLMs on HuggingFace (e.g., Falcon, LLaMa, etc.)
𝗣𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴
As LLMs are big… duh…
… they don’t fit on a single GPU.
As you want only to fine-tune the LLM, the community invented clever techniques that quantize the LLM (to fit on a single GPU) and fine-tune only a set of smaller adapters.
One popular approach is QLoRA, which can be implemented using HF’s `𝘱𝘦𝘧𝘵` Python package.
𝗠𝗟𝗢𝗽𝘀
As you want your project to get to production, you have to integrate the following MLOps components:
- experiment tracker to monitor & compare your experiments
- model registry to version & share your models between the FTI pipelines
- prompts monitoring to debug & track complex chains↳ All of them are available on ML platforms, such as Comet ML 🔗 https://lnkd.in/d7jNQz7m
𝗖𝗼𝗺𝗽𝘂𝘁𝗲 𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺
The most common approach is to train your LLM on your on-prem Nivida GPUs cluster or rent them on cloud providers such as AWS, Paperspace, etc.
But what if I told you that there is an easier way?
There is! It is called serverless.
For example, Beam is a GPU serverless provider that makes deploying your training pipeline as easy as decorating your Python function with `@𝘢𝘱𝘱.𝘳𝘶𝘯()`.
Along with ease of deployment, you can easily add your training code to your CI/CD to add the final piece of the MLOps puzzle, called CT (continuous training).
SigmaBoostAgents // pre-AGI for augmented thinking
Systems with the ability to generate new own plans and achieve these new goals for complex domain specific tasks // Agents or Copilots
I want an agent/copilot/assistant(s) who is able to get my high-level task/idea(s) to proactively define sub-goals, generate plans, and make actions long/short term in a variety of contexts. For learning, research, analysis, brainstorming, content/thoughts/ideas generating, etc.
SigmaBoostContracts // LLM in finance
Digital contracts written in natural language, next gen of smart-contracts for the people, businesses and AI-agents
The whole contract can be written in a natural language. Some structure should be used to support contract execution and financial integration to digital currencies (CBDC).
SigmaBoostLearning // Learning assistant (AI)
AI-assistants for quick, efficient and personalized learning from the updated experts knowledge.
Interactive learning, QA-sessions, quick intros, tests, theories & practicing, simulations, games, projects, team work, and much more..
// in progress…