Hello, I am really expanding now that I can use a VLLM compatible AI service, its amazing to be able to use deepseek and use VLLM parameters. One thing I am misssing is the /tokenize endpoint from VLLM. Is it possible you could expose this endpoint? Here is my use case.
I am running in a browser environment, so I can't run a tokenizer locally easily
I tokenize a '<<BREAK>>' string or find some token like <unk> or a special reserved token for that model that won't show up normally in a text generated by the AI or during inference
I then add that break string at the end of every chat message:
Then after that I can truncate the oldest user/assistant messages to a certain token length because I can count how many tokens are in a chat message by finding the break string token
Here is the code where I use the tokenized output to truncate chat messages to a certain context length.