render_message_to_json: Neither string content nor typed content is supported by the template. This is unexpected and may lead to issues.

#7
by whoisjeremylam - opened

Is anyone else getting this message in their ik_llama.cpp log as well? Is there a specific chat template I should be using? I'm just using --jinja, so it's the template in the metadata.

PS: Although smol-IQ2_KS is pretty sluggish, it's visibly a step change better than Qwen 3.5-397B which is my daily driver. This is on a medium sized back end code base with pretty lengthy specs (node and python).

PPS: Anecdotally, GLM 5.1 seems to make mistakes and have to fix minor issues it introduces at 100k context. This might be due to lack of DSA, or from the pretty aggressive quant, or it was just working in an area where the model didn't get as much RL training.

On the last point other models like Qwen 3.5 397B and MiniMax 2.7 all seem to have similar troubles (on Python CRUD against a pgvector database).

Is anyone else getting this message in their ik_llama.cpp log as well? Is there a specific chat template I should be using? I'm just using --jinja, so it's the template in the metadata.

i don't think it causes a problem, but you can modify the chat template and pass it in on the command line to remove it, details from reading through this thread: https://huggingface.co/ubergarm/GLM-5.1-GGUF/discussions/6#69dcf8c8333560d95ecac5ca

all seem to have similar troubles

How hard are you compressing your kv-cache and what coding client harness are you using?

poor models can never use the write tool with JSON haha.. probably has to double escape everything, so i tell it to use cat EOF HEREDOC bash when writing .json files etc...

i don't think it causes a problem, but you can modify the chat template and pass it in on the command line to remove it, details from reading through this thread: https://huggingface.co/ubergarm/GLM-5.1-GGUF/discussions/6#69dcf8c8333560d95ecac5ca

Oh, thanks for the link. I totally missed it. It is annoying having that message pop up in the log so it would be good to get rid of it. haha

all seem to have similar troubles

How hard are you compressing your kv-cache and what coding client harness are you using?

No KV cache quantization actually! It's pretty memory efficient, so I thought I'd use that as a way of offsetting any compounding effects (KV cache quant + lack of DSA).

I'm using pi.dev (now Earendil). It's fantastic - try it if you have a chance!

I like being able to define my own guardrails (instead of someone's arbitrary plan/execute modes) and building my own tools and skills as I require. The batteries fully included model for a harness wasn't fitting the way that I work.

@whoisjeremylam

I'm using pi.dev (now Earendil). It's fantastic - try it if you have a chance!

I've heard a few recommendations for pi now. I have been using opencode but honestly it is quite opaque, and even after much faffing about to force all the config and state into a single .opencode/ directory to encapsulate everything inside a docker bind mount, i'm not happy with it. i've added custom markdown subagents, but config is spread out across many files in both JSON and markdown format... even with custom system prompts it chews up 7k tokens to say "hello world" which is such a PITA using GLM-5.1 on CPU backend lmao...

i've heard of kon as well but it seems fairly new. i started vibe coding my own like so many do, but perhaps pi is worth a shot despite being node lol...

also, yeah good idea to keep GLM-5.1 kv-cache at f16 given it is already latent compressed by design.

cheers and thanks!

Sign up or log in to comment