Inference hanging

by Bit101 - opened 22 days ago

I'm using the H8-6.0BPW quantization and I'm finding that tabbyAPI it hanging at the end of inference. I'm not sure if the EOS token got muddled somewhere or if there's another weird problem but I did a full reinstall and checked the file hashes of the model and everything checked out.

DeathGodlike

Owner 22 days ago

Hey!

Alright, i will check it today, so we can pinpoint, what exactly is wrong with it.

DeathGodlike

Owner 22 days ago

It was an issue with the configuration files.
The following changes made the model stop correctly (and also prevented it from spitting out <think> / <s> tokens in between and at the end of sentences):

File	Change
`config.json`	`eos_token_id` set to `131072` (single value instead of array)
`tokenizer_config.json`	`eos_token: "<think>"`, `add_eos_token: true`

I'm using Text Generation Web UI by Oobabooga. These changes should work for TabbyAPI as well since they are model configuration files, but your mileage may vary due to differences in how it handles model inference using the ExLlamaV3 loader.

DeathGodlike

Owner 22 days ago

Uploaded new configuration files with the fix.

Bit101

21 days ago

Awesome. Thank you for the prompt fix.

Bit101 changed discussion status to closed 21 days ago

DeathGodlike

Owner 21 days ago

You are welcome!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment