Local small models // run on device, fine-tuned by large ones

4 min readDec 26, 2023

Large language models (LLMs) enable system builders today to create competent NLP systems through prompting, where they only need to describe the task in natural language and provide a few examples. However, in other ways, LLMs are a step backward from traditional special-purpose NLP models; they require extensive computational resources for deployment and can be gated behind APIs. In this paper, we propose Prompt2Model, a general-purpose method that takes a natural language task description like the prompts provided to LLMs, and uses it to train a specialpurpose model that is conducive to deployment. This is done through a multi-step process of retrieval of existing datasets and pretrained models, dataset generation using LLMs, and supervised fine-tuning on these retrieved and generated datasets. Over three tasks, we demonstrate that given the same few-shot prompt as input, Prompt2Model trains models that outperform the results of a strong LLM, gpt-3.5-turbo, by an average of 20% while being up to 700 times smaller. We also show that this data can be used to obtain reliable performance estimates of model performance, enabling model developers to assess model reliability before deployment. Prompt2Model is available open-source at https://github. com/neulab/prompt2model.

Here is a summary of the key points from the document:

- Prompt2Model is a system that takes a natural language prompt describing a task and automatically generates a specialized model to perform that task. It does this through dataset retrieval, dataset generation using a large language model (LLM) teacher, and model retrieval/fine-tuning.

- Prompt2Model aims to retain the ease-of-use of LLMs for prototyping while producing small, deployable models with concrete benefits over LLMs like lower computational cost.

- The system comprises modules for prompt parsing, dataset retrieval, dataset generation, model retrieval, training, evaluation, and demo creation. The authors provide a reference implementation using tools like gpt-3.5-turbo, DataFinder, and BM25.

- Experiments on question answering, code generation, and temporal normalization show Prompt2Model can produce models that outperform gpt-3.5-turbo while being much smaller, demonstrating the value of model distillation.

- The generated datasets can effectively estimate downstream model performance on real benchmarks. Combining retrieved and generated data creates accurate models at low annotation cost.

- Prompt2Model makes building custom NLP systems more accessible. It could enable research on prompt-based training and serve as an extensible platform for improving system components like dataset generation.

- Limitations include reliance on proprietary LLMs currently and limited support for non-English tasks. The system has only been tested on a few tasks so far.

AI Agent Self-Improvement + Self-Fine-Tune

https://www.youtube.com/watch?v=O5iLfzSFptg

Here is a summary of the key points from the video:

- The video discusses two approaches for enabling AI systems to update themselves with new knowledge — Google’s react-style agent with reinforcement self-training, and Microsoft’s metprompt vector store method.

- With Google’s approach, a react-style agent searches for new external data, then designs prompts to generate synthetic training data to iteratively fine-tune the LLM. The LLM’s outputs are evaluated and used as reward signals for reinforcement learning.

- Microsoft’s metprompt stores new semantic correlations in a vector database. The AI system queries this when needing updated knowledge.

- Google’s method allows an LLM to update itself overnight on a specific topic by searching the web and academic sources. It can create a small, specialized LLM to download to a phone.

- Evaluation moves from outcome-based to process supervision, with AI systems critiquing themselves. Direct AI feedback loops are used.

- Prompts and reward signals replace code in Google’s approach. The LLM ranks its own outputs to determine the best quality response.

- Attention-based LLMs currently outperform state-space models like Anthropic’s S4 on semantic understanding needed for effective in-context learning. But research is ongoing to improve SSMs.

Local small models // run on device, fine-tuned by large ones

AI Agent Self-Improvement + Self-Fine-Tune

Written by sbagency

No responses yet