Prompt Engineering

The iterative, experimental process of steering a model toward desired outputs is prompt engineering — it is a crucial and relatively easy/cheap method of optimizing LLMs.

A well-crafted prompt can significantly improve the model's performance by providing clear instructions on what and how the model is expected to generate outputs.

When having a conversation with a friend or colleague, rich context is often inferred through the setting, previous conversations, or the dynamic between yourself and the other person. In the case of LLMs, this doesn't exist, and you need to provide context and expectation framing explicitly through prompts.


Prompt Engineering Tips

Experiment — It is unlikely that you will achieve desired prompt results on the first attempt. Prepare to iterate and experiment with different prompts to find the optimal prompt for your use case.

Specificity — Make your prompts as specific as possible. The more specific the prompt, the more likely the model is to generate the desired output.

Contextual Information — Providing contextual information in your prompts can help guide the model to generate more accurate and relevant responses. This can include examples of past generations, as you will see in the Single/Few-Shot Learning section.

Prompt Length — The length of the prompt can also impact the model's performance. Experiment with different prompt lengths to find the optimal length for your specific model.

Here are streamlined best practices:

  • Know the Model: Understand the model's strengths and limitations to tailor effective prompts.

  • Be Specific: Clear prompts yield precise results. Ambiguity leads to uncertainty in outputs.

  • Provide Context: Relevant background information can enhance output quality.

  • Iterate: Refine prompts based on model feedback for better results.

  • Neutral Prompts: Avoid bias by keeping questions unbiased.

  • Role-Play: Assign roles to the model to direct responses.

  • Cognitive Checks: Use verification steps to improve output accuracy.

  • Conciseness: Lengthy prompts don't guarantee better responses. Be brief yet informative.

  • Word Choice: The prompt's language sets the tone and style of the response.

  • Experiment: Test various prompts and revise for optimal outcomes.

  • Training Data Insight: Craft prompts that resonate with the model's training.

  • Use Markers: Clearly delineate context and data boundaries.

  • Template Utilization: Adapt proven templates to fit your needs.

  • Stay Informed: Evolve your strategies with the model's advancements.


Separating Instructions

When crafting prompts for LLMs, it's best to separate instructions into distinct sections. This can include a section for the type of system behavior, the specific instruction or task, and an example of the desired output. This structure provides clear guidance to the model and can improve the quality of the generated responses.

Creating Prompt Separation

Prompt Template

======
system:
this is a system direction
======
instruction:
this is what you should do
======
example:
this is how the output should look
======

We find that using the system message works well for models that do not have explicit system messages or instructions. We use this format for multiple reasons include:

======= Creates strong visual separation, does not naturally occur in most prompts or data extracts, and is only 1 token.

section: Lowercase section headers use fewer tokens than their capitalized or uppercase counterparts.

Note: Some prompts do not need all three sections or even section titles. Experiment with different combinations to find the optimal prompt for your use case and model config.

Assistant Messages

OpenAI and Anthropic support an assistant dialogue message strucure, making it possible use different message types for instructions and examples. With OpenAI GPT-3.5 and GPT-4, we recommend including the instructive prompt in the System message and any relevant or dynamic context for the instruction in the User message. Klu automatically appends retrieved Context as an additional User message.

Using the example from above, here is a recommended prompt template structure:

System Message

this is a system direction. this is what you should do on the context provided.
======
example:
this is how the output should look
======

User Message

scenario:
{{scenario}}

Lowering Response Verbosity

LLMs are generalist models and many of the default responses bias toward lengthy, complete explanations. While great for personal assistant or education use cases, this is not always ideal. Additionally, you are paying for every token out from the model. Try placing the following phrases at the end of your prompt to minimize verbose responses:

  • be concise
  • minimize verbosity
  • output only the answer and nothing else

Concise Prompt Template

====== 
Summarize the article 
Be concise 
======
{{article}}

Improving Reasoning

An unintended capability of LLMs is the apperaance of their ability to reason. Given enough tokens and space to generate, you can improve the reasoning in responses. This is due to the probability of the model generating a response that is consistent with the prompt and previous outputs. Researchers refer to this as Chain-of-Thought or CoT prompting. Try placing the following phrases at the end of your prompt to improve the reasoning capabilities of the model:

  • think aloud
  • think step by step
  • create a step-by-step plan before you continue

Reasoning Prompt Template

====== 
Write a Python script for the task 
Formulate a step-by-step plan prior to proceeding
Generate a single Markdown codeblock after planning
======
{{task}}

In many cases, you will want to filter out the reasoning steps as it is likely not necessary for your next steps.


Single & Few-shot Prompting

Few-shot prompting is a method that guides LLM responses by offering a small set of examples, usually two to five. This approach is distinct from zero-shot prompting, which doesn't need examples, and single-shot prompting, which uses just one example.

These strategies aim to generate the desired outputs. Our tests indicate that most state-of-the-art models yield the best results with 1-2 examples, while the performance tends to decline with 3-5 examples. In some cases, 5-shot examples, if dynamically added to the prompt, will increase performance. Older LLMs (2022 or prior) or home-grown models often perform best with 3-5 examples until fine-tuned to your use case.

Few-shot Prompt Template

====== 
Detect if {{url}} is a news site 
Return site category 
Return boolean if news 
Return boolean if article 
====== 
examples: 

<example>
url = https://docs.klu.ai/ 
Category: Artificial Intelligence 
News: false 
Article: false 
</example>

<example>
url = https://www.iea.org/topics/russias-war-on-ukraine 
Category: Intergovernmental Energy Organization 
News: false 
Article: true 
</example>

======