Tokens are off

by lazyDataScientist - opened 23 days ago

Thought tokens <|channel|>5analysis<|message|> The location of the number 5 appears to be a random with every run. Then all too often the model gets stuck in an infinite loop.
Are there optimal settings we should have set?

llmfan46

Owner 23 days ago

•

edited 23 days ago

Thought tokens <|channel|>5analysis<|message|> The location of the number 5 appears to be a random with every run. Then all too often the model gets stuck in an infinite loop.
Are there optimal settings we should have set?

This is likely a chat template or sampler issue in your inference engine, not an issue with the model itself. Make sure you're using the correct chat template for gpt-oss models. What inference software are you using (LM Studio, KoboldCpp, etc.)? The gpt-oss architecture uses <|channel|>analysis<|message|> and <|channel|>final<|message|> tokens for its thinking/output chain and your software needs to handle these correctly.

If you continue to experience issues, try instead this variant:

https://huggingface.co/llmfan46/gpt-oss-120b-heretic-v2-GGUF

See if it fixed the issue.

lazyDataScientist

23 days ago

I am using LM Studio. LM Studio v4 removed the direct jinia edit feature so i am not sure how to fix the chat template.

llmfan46

Owner 23 days ago

I am using LM Studio. LM Studio v4 removed the direct jinia edit feature so i am not sure how to fix the chat template.

Have you tried:

https://huggingface.co/llmfan46/gpt-oss-120b-heretic-v2-GGUF

lazyDataScientist

23 days ago

yeah that version of it worked. Although, this model's thought process does show signs of refusal during the thought process.

llmfan46

Owner 23 days ago

•

edited 23 days ago

I just tested the model on LM Studio 0.4.8 and was unable to reproduce either issue there was no random numbers in the thinking tokens and no infinite loops. The model responded correctly across multiple prompts including creative writing tasks NSFW adult fiction writings works correctly as well, no refusals neither in the thought process nor the output.

Could you share your exact LM Studio version and any custom settings you may have changed (sampler settings, context length, chat template preset, etc.)? That would help narrow down the cause.

Also make sure you're using the latest version of LM Studio, as older versions may not handle gpt-oss's special tokens correctly.

I ran additional tests on both versions using LM Studio 0.4.8 and here are my findings:

Token issue: Could not reproduce on either version. No random numbers in thinking tokens, no infinite loops. Both models responded correctly across multiple prompts.

NSFW in thinking process: I tested both versions with explicit content prompts.

The Ultra Heretic (3/100 refusals) version deliberates minimally in the thinking process and produces content readily. The thinking output is brief and direct.
The Heretic v2 (9/100 refusals) version deliberates more in the thinking process, considers whether the content is within policy before complying, but still produces the content without refusing.

Both versions handle NSFW content without issues. The Ultra Heretic is simply more direct about it due to its lower refusal count. If you're seeing actual refusals in the thinking process that prevent content from being generated, that would again point to a configuration issue rather than the model itself.

Let me know your exact LM Studio version and settings and I can maybe help troubleshoot further.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment