Hello, I am really expanding now that I can use a VLLM compatible AI service, its amazing to be able to use deepseek and use VLLM parameters. One thing I am misssing is the /tokenize endpoint from VLLM. Is it possible you could expose this endpoint? Here is my use case. I am running in a browser environment, so I can't run a tokenizer locally easily I tokenize a '<<BREAK>>' string or find some token like <unk> or a special reserved token for that model that won't show up normally in a text generated by the AI or during inference I then add that break string at the end of every chat message: https://github.com/guspuffygit/sentient-sims-app/blob/main/src/main/sentient-sims/services/VLLMAIService.ts#L198-L206 Then after that I can truncate the oldest user/assistant messages to a certain token length because I can count how many tokens are in a chat message by finding the break string token Here is the code where I use the tokenized output to truncate chat messages to a certain context length. https://github.com/guspuffygit/sentient-sims-app/blob/main/src/main/sentient-sims/util/tokenTruncate.ts#L17-L69