Blank responses with Ollama

#2
by ymgenesis - opened

Getting completely blank responses when using this model with Ollama.
Funnily enough I can get some response when using curl with tags:

"prompt": "<|start|>user<|message|>Hello<|end|>\n<|start|>assistant<|channel|>final<|message|>",
"raw": true,
"stream": false

Otherwise, this model doesn't "chat" using Ollama, seemingly due to faulty modelfile tags or the way Ollama handles them, or something similar.

EDIT:
I was able to fix this by making a new modelfile:

FROM hf.co/DavidAU/OpenAi-GPT-oss-20b-HERETIC-uncensored-NEO-Imatrix-gguf:Q5_1

PARAMETER temperature 0.7
PARAMETER num_predict 512

# Critical: DO NOT stop on <|channel|> or <|message|> or <|start|>
# Only stop on return (or omit stop entirely).
PARAMETER stop <|return|>

TEMPLATE """<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.

# Valid channels: analysis, final.<|end|>
{{- range .Messages }}
{{- if eq .Role "user" }}
<|start|>user<|message|>{{ .Content }}<|end|>
{{- else if eq .Role "assistant" }}
<|start|>assistant<|channel|>final<|message|>{{ .Content }}<|end|>
{{- end }}
{{- end }}
<|start|>assistant"""

Even if you recreate an empty window with these parameters using the system prompt, and without it, just fragments of text, it breaks off in different places. The model does not work with Ollama.

Yes, neither did I get this working. The show indicate something that seems out of place "tart?
show hf.co/DavidAU/OpenAi-GPT-oss-20b-HERETIC-uncensored-NEO-Imatrix-gguf:Q8_0
Model
architecture gpt-oss
parameters 20.9B
context length 131072
embedding length 2880
quantization unknown

Capabilities
completion
thinking

Parameters
stop "<|start|>"
stop "<|message|>"
stop "<|end|>"
stop "<|channel|>"
stop "<|return|>"
stop "tart|>user<|message|>"

Ollama requires the "Harmony" system for OpenAI to be installed/config'ed.

Otherwise use Lmstudio, or other AI app.
Quants tested in Lmstudio, which uses Llamacpp.
Llama-server.exe will also work...

@ymgenesis

I was having the same issue as you, not only with this model, but most uncensored GGUF models. Your solution got it to at least begin thinking, but for me it just thinks indefinitely until it runs out of tokens. Rarely it will actually get into a response but it will always cut itself off because it was thinking for SO long. Is there a solution to this? Sorry I'm new.

Edit: The model works perfectly on LM Studio. It seems something is definitely broken with Ollama.

Rarely it will actually get into a response but it will always cut itself off because it was thinking for SO long. Is there a solution to this? Sorry I'm new.

I confirm that I have the same issue. It responds, but then it loops. I wonder if it is possible to resolve this issue with Ollama.

me too it keeps looping with "??????"

that is really un-useable at that state
or is it something with lm studio?
tho i made sure the parameters are as instructed.

deleted

Has anyone managed to get this model working with Ollama? I converted it using ollama create, taking the original modelfile from gpt-oss:20b via show >> modelfile, then adjusted the FROM line and created it. It runs fine with ollama run, but when I connect the model to Claude Code and a request triggers tools, it either gets stuck thinking indefinitely or returns empty responses.

I had a similar issues with models in vllm sometimes its about tools use paramters in the engine, I needed to turn it on later and everything went ok

Sign up or log in to comment