Prompt Engineering — six strategies for getting better results // update from OpenAI & community

3 min readDec 27, 2023

Just simple prompt engineering isn’t enough, provide semantic search and augmented generation (RAG), external functions that model can call, generate and execute code in your pipeline.

https://platform.openai.com/docs/guides/prompt-engineering

Write clear instructions

Provide reference text

Split complex tasks into simpler subtasks

Use intent classification to identify the most relevant instructions for a user query
For dialogue applications that require very long conversations, summarize or filter previous dialogue
Summarize long documents piecewise and construct a full summary recursively

Give the model time to “think”

Use external tools

Test changes systematically

Evaluate model outputs with reference to gold-standard answers

https://github.com/microsoft/promptbench

The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to evaluate LLMs. It consists of several key components that are easily used and extended by researchers: prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluation protocols, and analysis tools. PromptBench is designed to be an open, general, and flexible codebase for research purpose that can facilitate original study in creating new benchmarks, deploying downstream applications, and designing new evaluation protocols. The code is available at: https://github.com/microsoft/promptbench and will be continuously supported. Keywords: Evaluation, large language models, framework

This paper introduces 26 guiding principles designed to streamline the process of querying and prompting large language models. Our goal is to simplify the underlying concepts of formulating questions for various scales of large language models, examining their abilities, and enhancing user comprehension on the behaviors of different scales of large language models when feeding into different prompts. Extensive experiments are conducted on LLaMA-1/2 (7B, 13B and 70B), GPT-3.5/4 to verify the effectiveness of the proposed principles on instructions and prompts design. We hope that this work provides a better guide for researchers working on the prompting of large language models. Project page is available at https://github.com/VILA-Lab/ATLAS .