Prompt Caching in Serverless?
D
Drummer
Is it feasible to get prompt caching working for the serverless options? Through a flag that tries its best to keep you connected with one vLLM instance or being able to specify the instance yourself? I also heard you can save kv cache in the database, allowing consistent prompt caching from any instance.
Z T
Prompt cache should work now one most serverless models and cached tokens is in general 1/10 of input tokens cost.
Khue
This request is over a year old and there's no response?
Edit: I do some caching behavior but I'm not sure if there's any special headers, eg Fireworks does it via the "x-session-affinity" header.