Inference hanging
I'm using the H8-6.0BPW quantization and I'm finding that tabbyAPI it hanging at the end of inference. I'm not sure if the EOS token got muddled somewhere or if there's another weird problem but I did a full reinstall and checked the file hashes of the model and everything checked out.
Hey!
Alright, i will check it today, so we can pinpoint, what exactly is wrong with it.
It was an issue with the configuration files.
The following changes made the model stop correctly (and also prevented it from spitting out <think> / <s> tokens in between and at the end of sentences):
| File | Change |
|---|---|
config.json |
eos_token_id set to 131072 (single value instead of array) |
tokenizer_config.json |
eos_token: "<think>", add_eos_token: true |
I'm using Text Generation Web UI by Oobabooga. These changes should work for TabbyAPI as well since they are model configuration files, but your mileage may vary due to differences in how it handles model inference using the ExLlamaV3 loader.
Uploaded new configuration files with the fix.
Awesome. Thank you for the prompt fix.
You are welcome!