Ollama Support

#15
by yqchen-sci - opened

Thank you for providing the quantized model files. The official Ollama repository already includes this model, but I noticed that the UD-Q4_K_XL version you provided occupies less space. I would like to download your GGUF file and run it on Ollama. Could you please provide the corresponding Ollama model template and relevant parameter settings?

Unsloth AI org

It does not 100% work in Ollama as of this moment due to potential chat template incompatibility issues. If you are using Ollama, just use Ollama's one

It does not 100% work in Ollama as of this moment due to potential chat template incompatibility issues. If you are using Ollama, just use Ollama's one

OK, thanks.

I was able to make it work with this modelfile

FROM hf.co/unsloth/GLM-4.7-Flash-GGUF:Q4_K_XL

RENDERER glm-4.7
PARSER glm-4.7

TEMPLATE "[gMASK]<sop>{{ if .System }}<|system|>
{{ .System }}{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}{{ end }}<|assistant|>
{{ .Response }}"

PARAMETER stop <|user|>

PARAMETER temperature 1.0
PARAMETER top_p 0.95
PARAMETER min_p 0.01
PARAMETER repeat_penalty 1.0

And then execute something like this:
ollama create GLM-4.7-Flash-GGUF:Q4_K_XL -f glm47-flash.modelfile

I think basically the RENDERER glm-4.7 and PARSER glm-4.7 are important to be added.

I was able to make it work with this modelfile

FROM hf.co/unsloth/GLM-4.7-Flash-GGUF:Q4_K_XL

RENDERER glm-4.7
PARSER glm-4.7

TEMPLATE "[gMASK]<sop>{{ if .System }}<|system|>
{{ .System }}{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}{{ end }}<|assistant|>
{{ .Response }}"

PARAMETER stop <|user|>

PARAMETER temperature 1.0
PARAMETER top_p 0.95
PARAMETER min_p 0.01
PARAMETER repeat_penalty 1.0

And then execute something like this:
ollama create GLM-4.7-Flash-GGUF:Q4_K_XL -f glm47-flash.modelfile

I think basically the RENDERER glm-4.7 and PARSER glm-4.7 are important to be added.

Does this template support tool use? This model has strong capabilities for multi-turn tool use, so it would be helpful to provide an appropriate Ollama template.

I have no clue. Try it out. It's basically the original template of this Unsloth GGUF enriched by the RENDER, PARSER, temperature, top_p, min_p and repeat_penalty properties.

Sign up or log in to comment