Optimizing Actions

Large Language Models (LLMs) are incredibly powerful, but benefit greatly from optimization techniques depending on your use case. Klu enables you to easily optimize Actions over time.

This guide provides a high-level overview of the state-of-the-art techniques in LLM optimization, ranging from easy to more complex strategies. Each section contains practical examples and linked resources to provide a real-world perspective on these techniques. Whether you're new to working with LLMs or looking to explore new techniques, this guide offers valuable insights to help you optimize your LLMs effectively.


Prompt Engineering

The iterative, experimental process of steering a model toward desired outputs is prompt engineering — it is a crucial and relatively easy/cheap method of optimizing LLMs. A well-crafted prompt can significantly improve the model's performance by providing clear instructions on what and how the model is expected to generate outputs. When having a conversation with a friend or colleague, rich context is often inferred through the setting, previous conversations, or the dynamic between yourself and the other person. In the case of LLMs, this doesn't exist, and you need to provide context and expectation framing explicitly through prompts.

Prompt Engineering Tips

Experiment – It is unlikely that you will achieve desired prompt results on the first attempt. Prepare to iterate and experiment with different prompts to find the optimal prompt for your use case.

Specificity – Make your prompts as specific as possible. The more specific the prompt, the more likely the model is to generate the desired output.

Contextual Information – Providing contextual information in your prompts can help guide the model to generate more accurate and relevant responses. This can include examples of past generations, as you will see in the Single/Few-Shot Learning section.

Prompt Length – The length of the prompt can also impact the model's performance. Experiment with different prompt lengths to find the optimal length for your specific model.

Separating Instructions and Examples

LLMs are generalist models and many of the default responses bias toward lengthy, complete explanations. While great for personal assistant or education use cases, this is not always ideal. Additionally, you are paying for every token out from the model. Try placing the following phrases at the end of your prompt to minimize verbose responses:

Creating Prompt Separation

Prompt Template

======
system:
this is a system direction
======
instruction:
this is what you should do
======
example:
this is how the output should look
======

We find that using the system message works well for models that do not have explicit system messages or instructions. We use this format for multiple reasons include:

======= Creates strong visual separation, does not naturally occur in most prompts or data extracts, and is only 1 token.

section: Lowercase section headers use fewer tokens than their capitalized or uppercase counterparts.

Note: Some prompts do not need all three sections or even section titles. Experiment with different combinations to find the optimal prompt for your use case and model config.

Lowering Response Verbosity

LLMs are generalist models and many of the default responses bias toward lengthy, complete explanations. While great for personal assistant or education use cases, this is not always ideal. Additionally, you are paying for every token out from the model. Try placing the following phrases at the end of your prompt to minimize verbose responses:

  • be concise
  • minimize verbosity
  • output only the answer and nothing else

Concise Prompt Template

====== Summarize the article Be concise ======
{{article}}

Improving Reasoning

An unintended capability of LLMs is their ability to reason. This is due to the probability of the model generating a response that is consistent with the prompt and previous outputs. Researchers refer to this as Chain-of-Thought or CoT prompting. Try placing the following phrases at the end of your prompt to improve the reasoning capabilities of the model:

  • think aloud
  • think step by step
  • create a step-by-step plan before you continue

Reasoning Prompt Template

====== Create a python script for the following task Create a step-by-step plan
before you continue Output a single MD code block after the plan ======
{{task}}

In many cases, you will want to filter out the reasoning steps as it is likely not necessary for your next steps.


Single & Few-shot Prompting

Few-shot prompting is a technique used in guide LLM responses by providing a limited number of examples, typically between two and five. It stands apart from zero-shot prompting, which requires no examples, and single-shot prompting, which uses only one example. These techniques are used to generate desired outputs. In our testing we find that most models perform best with 1-2 examples, and many prompts will perform worse with 3-5 examples.

Few-shot Prompt Template

====== Detect if
{{url}}
is news site Return site category Return boolean if news ====== example: url =
https://docs.klu.ai/ Category: Artificial Intelligence News: false Article:
false url = https://www.iea.org/topics/russias-war-on-ukraine Category:
Intergovernmental Energy Organization News: false Article: true ======

A/B Experiments

With Klu you can run two Actions side by side and compare the results. This is a great way to test out different prompts or different models.

Creating A/B Experiments in Klu is easy. Go to Optimization section in the navigation. Then click on the "A/B Testing" tab and Add Experiment button.

Select the Actions you want to compare in the Experiment and then you're ready! Once the data starts filling up you can compare both the user feedback and system data (tokens, time, etc.)


Feedback

Gathering user Feedback is key to optimizing your Actions over time. We recommend gathering Feedback from users via your app using the Klu API or SDKs. You can also review and provide Feedback directly in Klu Studio.

Within the Klu Studio Data section you will see rating icons for each data point. Klu uses Human Feedback to automatically find other good or bad generations, as well as for downstream model training or fine-tuning to generate better outputs.

Click on the meatball menu next to any data point to access additional Feedback options including: setting use behavior, flagging generating issues, or setting a correction for a data point.

Within that modal you have various feedback options:

  • User Action - set a user behavior, including Saved, Copied, Shared, or Deleted.
  • Generation Issue - set a generation issue, including Hallucination, Inappropriate, or Repetition.
  • Response Correction - set a Response Correction based on the completion output for a data point.

Fine-tuning

Fine-tuning is one of the most powerful features of Klu Studio. It allows you to take an OpenAI base model and fine tune it on your own data. This is a very powerful feature and can be used to improve the quality of your outputs for tasks that are unique to your business. Create a fine-tuned model may also be used to improve the style or tone of voice of the writing to match your brand. We recommend that you use this feature only after you have created a good amount of data and have provided feedback on the outputs. We recommend the following minimums:

  • Style Transfer – 20-30 examples for transferring brand voice and writing style
  • Familiar Task – 30-100 examples for a familiar task like generating JSON objects
  • New Task – 1000-2000 examples for a new task like executing a trust and safety policy

Fine tuning LLMs is like refining a computer program so it can understand human instructions and generate high-quality outputs more consistently, making it more useful and efficient in tasks like writing emails in your brand voice, generating JSON outputs for your application, or performing an entirely new task like executing a trust and safety policy.

We recommend running a few experiments to find the optimal number of examples for your specific use case. You will want to run an Action in production and gather feedback on the outputs. This serves as the training data for the fine-tuning process. Visit the Data section and filter to the datapoints you want to use for fine tuning. You can save this filter as a Data Frame at any time.

Filter to your preferred data points, save to a Data Frame, and click on Optimize. Give the fine-tuned model a name, select your base model, and click "Optimize Model."

You will have to wait a few minutes to complete, but once it is done, you will be able to find your model in the dropdown list of models. The name of your model will look like this:

{base_model}:ft:{the name you gave it in klu}-datetime

You can use this model anywhere in Klu - either in the Playground or in Actions.