Enable ephemeral prompt caching with LangSmith metrics #28

dlqqq · 2025-12-04T02:12:50Z

Description

Closes Excessive token use causes rate limit exceptions on Anthropic #27.
Enables ephemeral prompt caching by passing the required arguments to litellm.acompletion().
- Documentation: https://docs.litellm.ai/docs/tutorials/prompt_caching
Decreased token usage by 85-95 percentage points on a small tool-calling invocation with a ~10 messages in the chat history.
The ChatLiteLLM provider has been made significantly more type-safe, and the _astream() method now clarifies the type of every object created & used there.
Implements the required LiteLLM <=> LangChain metrics integration to show cache metrics in LangSmith.

Demo

(low-resolution video because of GitHub's 10MB file upload limit)

Screen.Recording.2025-12-03.at.5.53.59.PM-2.mov

Minor "breaking" changes to the `ChatLiteLLM` provider

I have removed the _stream() method implementation to avoid code duplication. This can be easily re-implemented (without duplication) if needed in the future; the code comment there details how.
I needed to change the API of the _create_usage_metadata() helper function to provide the cache metrics in LangSmith and to improve its type safety. This means that every other "invocation" method except astream() (e.g. generate()) is likely broken since they eventually call this function. This should not have any impact on Jupyternaut since we are always calling astream() anyways.

dlqqq · 2025-12-04T19:10:28Z

Correction: LangSmith is being used to provide the dashboard, not LangServe

dlqqq · 2025-12-05T18:50:06Z

Rebased to include #29.

3coins · 2025-12-05T19:40:54Z

@dlqqq
Thanks for submitting this change, prompt caching should be a huge improvement. I see the following errors while using bedrock and haiku-4.5. Here is the model id I used: bedrock/global.anthropic.claude-haiku-4-5-20251001-v1:0

 litellm.exceptions.MidStreamFallbackError: litellm.ServiceUnavailableError: litellm.MidStreamFallbackError: litellm.BadRequestError: BedrockException - serviceUnavailableException {"message":"Bedrock is unable to process your request."} Original exception: BadRequestError: litellm.BadRequestError: BedrockException - serviceUnavailableException {"message":"Bedrock is unable to process your request."}
    During task with name 'model' and id '71a7d956-8d16-5ffa-e0d6-127818e2a03c'

Don't see this issue when I switched to main, or using anthropic directly.

3coins · 2025-12-05T19:49:18Z

Seems like this might be related to the prompt caching args added, which are not being passed correctly or missing some other config for bedrock. Once I removed the prompt caching block, things seem to work.

3coins · 2025-12-05T20:38:28Z

@dlqqq
Bedrock converse with model id bedrock/converse/global.anthropic.claude-haiku-4-5-20251001-v1:0 seems to work without any errors, so this is an issue with invoke only.

dlqqq · 2025-12-05T23:02:29Z

Thanks for catching this. Would it be sufficient to disable this feature if the model ID starts with bedrock/, but does not start with bedrock/converse/?

3coins

Looks good!

dlqqq added the enhancement New feature or request label Dec 4, 2025

dlqqq changed the title ~~Enable ephemeral prompt caching with LangServe metrics~~ Enable ephemeral prompt caching with LangSmith metrics Dec 4, 2025

dlqqq requested a review from 3coins December 4, 2025 19:48

dlqqq added 2 commits December 5, 2025 10:49

enable ephemeral prompt caching by default

d9be320

add detailed token usage info for LangServe UI

353850c

dlqqq force-pushed the impl-prompt-caching branch from c3cd673 to 353850c Compare December 5, 2025 18:49

disable prompt caching when using Bedrock Invoke API

4daa42f

3coins approved these changes Dec 6, 2025

View reviewed changes

dlqqq merged commit 1404b38 into jupyter-ai-contrib:main Dec 6, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable ephemeral prompt caching with LangSmith metrics #28

Enable ephemeral prompt caching with LangSmith metrics #28

Uh oh!

dlqqq commented Dec 4, 2025 •

edited

Loading

Uh oh!

dlqqq commented Dec 4, 2025

Uh oh!

dlqqq commented Dec 5, 2025 •

edited

Loading

Uh oh!

3coins commented Dec 5, 2025

Uh oh!

3coins commented Dec 5, 2025

Uh oh!

3coins commented Dec 5, 2025

Uh oh!

dlqqq commented Dec 5, 2025 •

edited

Loading

Uh oh!

3coins left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable ephemeral prompt caching with LangSmith metrics #28

Enable ephemeral prompt caching with LangSmith metrics #28

Uh oh!

Conversation

dlqqq commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Demo

Minor "breaking" changes to the ChatLiteLLM provider

Uh oh!

dlqqq commented Dec 4, 2025

Uh oh!

dlqqq commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

3coins commented Dec 5, 2025

Uh oh!

3coins commented Dec 5, 2025

Uh oh!

3coins commented Dec 5, 2025

Uh oh!

dlqqq commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

3coins left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dlqqq commented Dec 4, 2025 •

edited

Loading

Minor "breaking" changes to the `ChatLiteLLM` provider

dlqqq commented Dec 5, 2025 •

edited

Loading

dlqqq commented Dec 5, 2025 •

edited

Loading