Getting started with the API and SDKs
There are three core concepts that are key to working with the Klu API or SDKs. The foundational primitive of Klu is the Action, Klu performs Retrieval Augmented Generation (RAG) to ground generations in Context, and all interactions with Klu are Authenticated via your Klu API Key.
pip install klu
Find your API Key in your Klu workspace. Visit Settings and select the API Keys tab. All API Keys link to a specific User and Workspace. Here is a sample API Key and how it used in practice.
The LLM API Engine is the core of all generative functionality in Klu. It can be accessed directly via the API or through the SDK, and it powers the creation of Actions, which include a Prompt Template, Model Config, and Klu-specific configurations for Context, Skills, or Output Formatting. The LLM API Engine also supports advanced features such as caching and sessions.
Klu automatically retries LLM requests for you, ensuring that temporary issues do not disrupt your operations. This feature is enabled by default, providing robustness and reliability to your API interactions. You can customize the retry behavior according to your needs, including the number of retries and the delay between attempts.
Klu automatically manages request timeouts for you. This feature is enabled by default to ensure that your operations are not disrupted by long-running requests. You can customize the timeout duration according to your needs, providing flexibility and control over your API interactions.
Klu comes with pre-configured environments - preview, staging, and production - to help you manage your API interactions effectively.
Deployments in Klu are the process of moving Action versions to specific environments. This is a crucial step in managing your API interactions effectively. You can deploy action versions to preview, staging, or production environments based on your needs.
Actions are the foundation for all generative functionality in Klu. Whether created in the UI or via the API/SDK, Actions contain a Prompt Template, Model Config, and Klu-specific configurations for Context, Skills, or Output Formatting. The Actions API enables two additional powerful features: caching and sessions.
Klu provides automatic version tracking for changes to your Actions' Prompt or Model Config. This feature is enabled by default, ensuring that all modifications are tracked and can be reverted if necessary. You can deploy previous versions to preview environments.
Klu automatically caches Action generations for you, however returning cached responses is disabled by default. This is great for saving money and time. You can also manually clear the cache for a specific Action or turn off caching.
Klu automatically saves conversation memory via Sessions. Sessions are great for multi-turn conversations and for saving state between requests.
LLM Evals is a powerful feature that utilizes GPT-4 to compare and evaluate new versions of Actions against the old ones before deploying them to production. This ensures optimal performance and accuracy of your Actions.
Klu automatically labels generations for topic, sentiment, and helpfulness. This feature provides valuable insights into the performance and effectiveness of your Actions.
Klu enables you to create A/B Experiments for two Actions. This is great for testing different models, Prompt Templates, or other configuration changes.
Retrieval Augmented Generation (RAG) is a powerful technique that combines the best of both worlds: the ability to generate text from scratch and the ability to ground generation in the right information from a document or database. Klu automatically handles this on your behalf when Context is connected to an Action.
Context is the key to RAG in Klu. Actions link to a Context library, which is a collection of documents originating from files, integrations, or databases. Context libraries are automatically indexed and optimized for retrieval. You can add and remove additional documents to your Context library at any time.
Retrieval and Chunking
Retrieval and chunking are key aspects of RAG in Klu. These settings greatly change retrieval behavior and performance
- Response mode: This can be set to 'search', 'refine', or 'tree summarize' depending on the specific requirements of your application.
- Max response length: Maximum length of the response that the retrieval process can generate.
- Similarity top k: Specify the number of top similar documents to consider during the retrieval process.
- Doc size (tokens): This determines the size of the chunks into which the documents are divided. The size is specified in terms of the number of tokens.
- Overlap (tokens): This setting determines the number of tokens that can overlap between two consecutive chunks.
- Text splitter: This can be set to 'tokens', 'sentence', 'character', or 'code' depending on how you want the text to be split into chunks.
Klu provides automatic version tracking for changes to your Context libraries, including document chunking and retrieval settings. This feature is enabled by default, ensuring that all modifications are tracked and can be reverted if necessary.
Filter Context with Metadata
Klu enables you to add metadata to your Context documents. Metadata enables powerful filtering of Context before performing RAG. This is great for multi-tenant data scenarios or for filtering out data that is not relevant to your generation. Filtering also enables Q&A on a specific document contained within a Context library.