Prompt Caching in Serverless? | Voters

Prompt Caching in Serverless?

Drummer

Is it feasible to get prompt caching working for the serverless options? Through a flag that tries its best to keep you connected with one vLLM instance or being able to specify the instance yourself? I also heard you can save kv cache in the database, allowing consistent prompt caching from any instance.

April 18, 2025

Z T

Prompt cache should work now one most serverless models and cached tokens is in general 1/10 of input tokens cost.

Khue

This request is over a year old and there's no response?

Edit: I do some caching behavior but I'm not sure if there's any special headers, eg Fireworks does it via the "x-session-affinity" header.