Tool Calling?
Didn’t wanna bother IK before I had a chance to do more triage, but have you had any luck getting this model working with tool calling in ik_lcpp? I’ve been experimenting with some IK quants myself but it seems to fall flat with Continue in agent mode, abruptly terminating with a ton of parser errors. Wondering if there’s just something I’m missing besides the classic jinja flag.
I've been using ik_llama.cpp with this opencode.json in the working directory when starting opencode:
{
"$schema": "https://opencode.ai/config.json",
"share": "disabled",
"autoupdate": false,
"experimental": {
"openTelemetry": false
},
"tools": {
"websearch": true,
"write": false,
"todos": false
},
"disabled_providers": ["exa"],
"provider": {
"LMstudio": {
"npm": "@ai-sdk/openai-compatible",
"name": "ik_llama.cpp (local)",
"options": {
"baseURL": "http://localhost:8080/v1",
"timeout": 99999999999
},
"models": {
"Kimi-K2.5-Q4_X": {
"name": "Kimi-K2.5-Q4_X",
"limit": { "context": 1000000, "output": 32000 },
"cost": { "input": 5.0, "output": 25.0 },
"temperature": true,
"reasoning": true,
"tool_call": true
}
}
}
}
}
The Kimi name there doesn't seem to matter it just uses whatever you load. I'm starting it up like so:
model=/mnt/raid/models/ubergarm/Qwen3.5-27B-GGUF/Qwen3.5-27B-smol-IQ4_NL.gguf
_GLIBCXX_REGEX_STATE_LIMIT=1000000 \
CUDA_VISIBLE_DEVICES="0,1" \
./build/bin/llama-server \
--model "$model" \
--alias Qwen3.5-27B \
-c 262144 \
-fa on \
-ger \
--merge-qkv \
-sm graph \
-ngl 99 \
-ub 4096 -b 4096 \
--threads 1 \
--host 127.0.0.1 \
--port 8080 \
--parallel 1 \
--jinja \
--no-mmap
The files are not perfect but close enough that it seems to be working well for me building against tip of main today with latest chunked dedlta net fixes.
OK great, thank you so much for the lead on this. I've been using mainline, but obviously I prefer high-quality IK quants when I can get 'em! I'll run some tests on this tonight, and thanks again for being the lighthouse of the ik_lcpp community haha.