Getting started with the API and SDKs

There are three core concepts that are key to working with the Klu API or SDKs. The foundational primitive of Klu is the Action, Klu performs Retrieval Augmented Generation (RAG) to ground generations in Context, and all interactions with Klu are Authenticated via your Klu API Key.

Installing the Klu SDK

The recommended way to interact with the Klu API is by using one of our official SDKs. Klu offers Python and TypeScript libraries to make development easier and faster.

pip install klu

Installing the Klu CLI

The Klu CLI offers both the ability to configure your applications and actions using declarative YAML files, and the ability to interact with your Workspace using interactive and non-interactive commands.

Before installing the CLI, you'll need to make sure you are running at least Node.js v20.9.0. We recommend using Volta to manage your Node.js versions and volta install to install any global tool binaries.

npm install -g @kluai/cli

Getting started

Requirement: you'll need to sign up for a Klu account and create a workspace if you haven't already done so.

Once you've created a workspace and installed the CLI, to get started using the CLI you'll need to authenticate it to a workspace using a Klu API key. You can run klu to do this. After that's done you'll be logged into that workspace and able to run commands against it.

If you're using this workspace for the first time, there's a helpful klu init command that can setup your workspace for you. Running this will create klu-app-<app-name>.yaml files at the current working directory of your local machine describing the desired state of your remote Klu workspace. You'll want to change your working directory to a suitable location before running this command.

If you already have prompts defined within a TypeScript app it'll even help you to migrate these into this local workspace. Just run klu init --with-migrate <project-dir> and follow the instructions (or klu migrate --include <dir|glob> if you only want to migrate prompts).

# Authenticate to a workspace

# Initialize the workspace locally, migrating any prompts found within the project
klu init --with-migrate <project-dir>

Working with a local workspace

It's possible to run commands that directly alter your remote workspace and a full list of commands can be found if you run klu help. However, the easiest way to manage your Klu workspace is locally through declarative app files that can be "pushed" and "pulled" from it.

Assuming you've already initialized your local workspace, below is a cheatsheet of how to add new apps and actions to it and how to upload your local changes to the remote workspace once you're happy with them.

# Pull the latest changes from the remote workspace
klu pull --yes

# Scaffold a new app
klu scaffold app --name <app-slug> --yes

# Scaffold a new action within the app you created
klu scaffold action --app <app-slug> --action-type chat --name <action-slug> --model gpt-4 --yes

# Edit a local copy of an app using your default editor
# Please ensure that your $VISUAL or $EDITOR env vars are set
klu app edit <app-slug> --no-push

# Check your local changes against the remote workspace
klu push --dry-run

# Push your local changes to the remote workspace
klu push --yes

Deploying and prompting actions

Of course, once you've created your first action, you'll want to test that it works. Before doing so you'll need to deploy your action into an environment. Once this is done, you can prompt it directly for the command line to test it by hand.

Deploying and prompting actions

# Deploy an action first before you use it
klu action deploy <action-slug> --environment staging

# Start an interactive session with an action
klu action prompt <action-slug> --interactive

# Prompt an action and stream its output
klu action prompt <action-slug> --input "Write a short simplistic lullaby for a little program that does their best."

# Prompt an action asynchronously in the background
klu action prompt <action-slug> --input "Write a haiku." --json --async


Find your API Key in your Klu workspace. Visit Settings and select the API Keys tab. All API Keys link to a specific User and Workspace. Here is a sample API Key and how it used in practice.


LLM Engine

The LLM API Engine is the core of all generative functionality in Klu. It can be accessed directly via the API or through the SDK, and it powers the creation of Actions, which include a Prompt Template, Model Config, and Klu-specific configurations for Context, Skills, or Output Formatting. The LLM API Engine also supports advanced features such as caching and sessions.


Klu automatically retries LLM requests for you, ensuring that temporary issues do not disrupt your operations. This feature is enabled by default, providing robustness and reliability to your API interactions. You can customize the retry behavior according to your needs, including the number of retries and the delay between attempts.


Klu automatically manages request timeouts for you. This feature is enabled by default to ensure that your operations are not disrupted by long-running requests. You can customize the timeout duration according to your needs, providing flexibility and control over your API interactions.


Klu comes with pre-configured environments - preview, staging, and production - to help you manage your API interactions effectively.


Deployments in Klu are the process of moving Action versions to specific environments. This is a crucial step in managing your API interactions effectively. You can deploy action versions to preview, staging, or production environments based on your needs.


Actions are the foundation for all generative functionality in Klu. Whether created in the UI or via the API/SDK, Actions contain a Prompt Template, Model Config, and Klu-specific configurations for Context, Skills, or Output Formatting. The Actions API enables two additional powerful features: caching and sessions.


Klu provides automatic version tracking for changes to your Actions' Prompt or Model Config. This feature is enabled by default, ensuring that all modifications are tracked and can be reverted if necessary. You can deploy previous versions to preview environments.


Klu automatically caches Action generations for you, however returning cached responses is disabled by default. This is great for saving money and time. You can also manually clear the cache for a specific Action or turn off caching.


Klu automatically saves conversation memory via Sessions. Sessions are great for multi-turn conversations and for saving state between requests.


LLM Evals is a powerful feature that utilizes GPT-4 to compare and evaluate new versions of Actions against the old ones before deploying them to production. This ensures optimal performance and accuracy of your Actions.


Klu automatically labels generations for topic, sentiment, and helpfulness. This feature provides valuable insights into the performance and effectiveness of your Actions.

A/B Experiments

Klu enables you to create A/B Experiments for two Actions. This is great for testing different models, Prompt Templates, or other configuration changes.


Retrieval Augmented Generation (RAG) is a powerful technique that combines the best of both worlds: the ability to generate text from scratch and the ability to ground generation in the right information from a document or database. Klu automatically handles this on your behalf when Context is connected to an Action.


Context is the key to RAG in Klu. Actions link to a Context library, which is a collection of documents originating from files, integrations, or databases. Context libraries are automatically indexed and optimized for retrieval. You can add and remove additional documents to your Context library at any time.

Retrieval and Chunking

Retrieval and chunking are key aspects of RAG in Klu. These settings greatly change retrieval behavior and performance

Retrieval settings:

  • Response mode: This can be set to 'search', 'refine', or 'tree summarize' depending on the specific requirements of your application.
  • Max response length: Maximum length of the response that the retrieval process can generate.
  • Similarity top k: Specify the number of top similar documents to consider during the retrieval process.

Chunking settings

  • Doc size (tokens): This determines the size of the chunks into which the documents are divided. The size is specified in terms of the number of tokens.
  • Overlap (tokens): This setting determines the number of tokens that can overlap between two consecutive chunks.
  • Text splitter: This can be set to 'tokens', 'sentence', 'character', or 'code' depending on how you want the text to be split into chunks.

Retrieval Versions

Klu provides automatic version tracking for changes to your Context libraries, including document chunking and retrieval settings. This feature is enabled by default, ensuring that all modifications are tracked and can be reverted if necessary.

Filter Context with Metadata

Klu enables you to add metadata to your Context documents. Metadata enables powerful filtering of Context before performing RAG. This is great for multi-tenant data scenarios or for filtering out data that is not relevant to your generation. Filtering also enables Q&A on a specific document contained within a Context library.

SDK Exports