Universal classifiers // LLMs/ SLMs

sbagency
3 min readJan 8, 2024

--

https://arxiv.org/pdf/2312.17543.pdf

Generative Large Language Models (LLMs) have become the mainstream choice for fewshot and zeroshot learning thanks to the universality of text generation. Many users, however, do not need the broad capabilities of generative LLMs when they only want to automate a classification task. Smaller BERT-like models can also learn universal tasks, which allow them to do any text classification task without requiring fine-tuning (zeroshot classification) or to learn new tasks with only a few examples (fewshot), while being significantly more efficient than generative LLMs. This paper (1) explains how Natural Language Inference (NLI) can be used as a universal classification task that follows similar principles as instruction fine-tuning of generative LLMs, (2) provides a step-by-step guide with reusable Jupyter notebooks for building a universal classifier,1 and (3) shares the resulting universal classifier that is trained on 33 datasets with 389 diverse classes. Parts of the code we share has been used to train our older zeroshot classifiers that have been downloaded more than 55 million times via the Hugging Face Hub as of December 2023. Our new classifier improves zeroshot performance by 9.4%.

Here are the key points from the doc:

- The paper introduces a step-by-step guide and code for training universal text classifiers using natural language inference (NLI).

- NLI involves determining if the meaning of one text (hypothesis) is entailed in another text (premise). This allows reformulating any text classification task into an entailment vs non-entailment decision.

- The approach trains BERT-style models on a mix of NLI datasets (885000 examples) and 28 classification datasets (51731 examples) with 389 diverse classes.

- The resulting model deberta-v3-zeroshot-v1.1-all-33 outperforms NLI-only models by 9.4% on average on held-out zeroshot tasks. It is available in different sizes.

- The paper provides notebooks to preprocess data, formulate hypotheses, train models, and visualize results. The model and code enable zeroshot classification and few-shot fine-tuning.

- Limitations are the model architecture and lack of diversity in training data. The paper calls for a new self-supervised foundation model tailored to efficient universal classification.

In summary, the paper offers practical guidance and strong baseline models for low-resource text classification by leveraging NLI and transfer learning. The code and models lower barriers for applying state-of-the-art NLP.

https://github.com/MoritzLaurer/zeroshot-classifier

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

No responses yet