Managing Data in Klu
A data point in Klu contains a full input and output pair, the model that was used to generate it, how many input tokens were used, the output tokens and also the latency.
Once you have started using Large Language Models in production you will quickly realize that you need a place to start managing all of the data easily and make use of it.
If you have existing prompt completion data, importing all of it into Klu is easy.
Importing via the SDK
With the Klu SDK, bringing all of your data is a piece of cake. The workflow is as follows:
- Create a Klu account
- Create a new app
- Create an action for each prompt you are running
from klu import Klu klu = Klu("YOUR_API_KEY") input = "What you sent to the LLM" output = "What the LLM responded with" action = "action_guid" feedback = True or False data = klu.data.create(input=input, output=output, action=action) if feedback: feedback = klu.feedback.create( data_guid=data.guid, type="rating", # or action or issue or correction value="1", # if negative feedback, 2 if positive, Any value for issue, action or correct created_by="user_id", # could be a Klu user or an external user source="backfill" # or specific source, api by default ) print(data)
With this, you can now import all of your data into Klu.
When you are in an app you can find the "Data" screen on the left hand side of the main menu.
Once you click on it you will see all the data points generated within that application. Filtering this down to a specific data set is very easy. Currently Klu supports the following filters:
- User - which user generated the output. If done via API we will use treat any outputs generated by a user's api key as the user's.
- Action - which action was used to generate the output.
- Source - where did this output happen. The Klu application by default tags everything as Klu. You can also modify the source via the API.
- Feedback - whether or not the output was marked as good or bad.
- Issue - if there are any issues present (hallucinations, toxicity, etc) you can filter by those here.
If you want to keep the data set for later use you can save it as a data set. This will allow you to use it for training or evaluation.
With Klu you can provide feedback on any output. This feedback is then used to train the model to generate better outputs. From the app, simply press on the thumbs up or thumbs down to flag a output as good or bad.
Within the data section you will see the thumbs up and thumbs down icons. That is the easiest way to mark a output as good or bad.
If you would like to edit the outputs, flag them as inappropriate or add additional context, the easiest way is to click on the icon on the right hand side of the output.
Within that modal you have various options of how to add context:
User Action- set a user behavior, including Saved, Copied, Shared, or Deleted.
Generation Issue- set a generation issue, including Hallucination, Inappropriate, or Repetition.
Response Correction- set a correction based on the completion output for a data point.
All of these will help you later when it comes to fine tuning and optimization.
With Klu you are not locked in. All of your data is yours and you can export it at any time.
We currently provide two data export formats:
CSV- easily import into a spreadsheet for further analysis
JSONL- your data is ready to be used in a different system that accepts JSONL files (e.g. fine tuning or evals).
You can also export data via the API or SDK.