Getting started with the API and SDKs
There are three core concepts that are key to working with the Klu API or SDKs. The foundational primitive of Klu is the Action, Klu performs Retrieval Augmented Generation (RAG) to ground generations in Context, and all interactions with Klu are Authenticated via your Klu API Key.
Installing the Klu SDK
The recommended way to interact with the Klu API is by using one of our official SDKs. Klu offers Python and TypeScript libraries to make development easier and faster.
pip install klu
Authentication
Find your API Key in your Klu workspace. Visit Settings and select the API Keys tab. All API Keys link to a specific User and Workspace. Here is a sample API Key and how it used in practice.
sh9FuZoy8LpE2y6s3t9fv7mhlD3Dr0+1dZtHG1njsThOrc=
LLM Engine
The LLM API Engine is the core of all generative functionality in Klu. It can be accessed directly via the API or through the SDK, and it powers the creation of Actions, which include a Prompt Template, Model Config, and Klu-specific configurations for Context, Skills, or Output Formatting. The LLM API Engine also supports advanced features such as caching and sessions.
Retries
Klu automatically retries LLM requests for you, ensuring that temporary issues do not disrupt your operations. This feature is enabled by default, providing robustness and reliability to your API interactions. You can customize the retry behavior according to your needs, including the number of retries and the delay between attempts.
Timeout
Klu automatically manages request timeouts for you. This feature is enabled by default to ensure that your operations are not disrupted by long-running requests. You can customize the timeout duration according to your needs, providing flexibility and control over your API interactions.
Environments
Klu comes with pre-configured environments - preview, staging, and production - to help you manage your API interactions effectively.
Deployments
Deployments in Klu are the process of moving Action versions to specific environments. This is a crucial step in managing your API interactions effectively. You can deploy action versions to preview, staging, or production environments based on your needs.
Actions
Actions are the foundation for all generative functionality in Klu. Whether created in the UI or via the API/SDK, Actions contain a Prompt Template, Model Config, and Klu-specific configurations for Context, Skills, or Output Formatting. The Actions API enables two additional powerful features: caching and sessions.
Versions
Klu provides automatic version tracking for changes to your Actions' Prompt or Model Config. This feature is enabled by default, ensuring that all modifications are tracked and can be reverted if necessary. You can deploy previous versions to preview environments.
Caching
Klu automatically caches Action generations for you, however returning cached responses is disabled by default. This is great for saving money and time. You can also manually clear the cache for a specific Action or turn off caching.
Sessions
Klu automatically saves conversation memory via Sessions. Sessions are great for multi-turn conversations and for saving state between requests.
Evaluations
LLM Evals is a powerful feature that utilizes GPT-4 to compare and evaluate new versions of Actions against the old ones before deploying them to production. This ensures optimal performance and accuracy of your Actions.
Insights
Klu automatically labels generations for topic, sentiment, and helpfulness. This feature provides valuable insights into the performance and effectiveness of your Actions.
A/B Experiments
Klu enables you to create A/B Experiments for two Actions. This is great for testing different models, Prompt Templates, or other configuration changes.
RAG
Retrieval Augmented Generation (RAG) is a powerful technique that combines the best of both worlds: the ability to generate text from scratch and the ability to ground generation in the right information from a document or database. Klu automatically handles this on your behalf when Context is connected to an Action.
Context
Context is the key to RAG in Klu. Actions link to a Context library, which is a collection of documents originating from files, integrations, or databases. Context libraries are automatically indexed and optimized for retrieval. You can add and remove additional documents to your Context library at any time.
Retrieval and Chunking
Retrieval and chunking are key aspects of RAG in Klu. These settings greatly change retrieval behavior and performance
Retrieval settings:
- Response mode: This can be set to 'search', 'refine', or 'tree summarize' depending on the specific requirements of your application.
- Max response length: Maximum length of the response that the retrieval process can generate.
- Similarity top k: Specify the number of top similar documents to consider during the retrieval process.
Chunking settings
- Doc size (tokens): This determines the size of the chunks into which the documents are divided. The size is specified in terms of the number of tokens.
- Overlap (tokens): This setting determines the number of tokens that can overlap between two consecutive chunks.
- Text splitter: This can be set to 'tokens', 'sentence', 'character', or 'code' depending on how you want the text to be split into chunks.
Retrieval Versions
Klu provides automatic version tracking for changes to your Context libraries, including document chunking and retrieval settings. This feature is enabled by default, ensuring that all modifications are tracked and can be reverted if necessary.
Filter Context with Metadata
Klu enables you to add metadata to your Context documents. Metadata enables powerful filtering of Context before performing RAG. This is great for multi-tenant data scenarios or for filtering out data that is not relevant to your generation. Filtering also enables Q&A on a specific document contained within a Context library.
SDK Exports