Tool Calling?

by dinerburger - opened Feb 26

•

Didn’t wanna bother IK before I had a chance to do more triage, but have you had any luck getting this model working with tool calling in ik_lcpp? I’ve been experimenting with some IK quants myself but it seems to fall flat with Continue in agent mode, abruptly terminating with a ton of parser errors. Wondering if there’s just something I’m missing besides the classic jinja flag.

ubergarm

Owner Feb 27

•

edited Feb 27

I've been using ik_llama.cpp with this opencode.json in the working directory when starting opencode:

{
    "$schema": "https://opencode.ai/config.json",
    "share": "disabled",
    "autoupdate": false,
    "experimental": {
        "openTelemetry": false
    },
    "tools": {
        "websearch": true,
        "write": false,
        "todos": false
    },
    "disabled_providers": ["exa"],
    "provider": {
        "LMstudio": {
            "npm": "@ai-sdk/openai-compatible",
            "name": "ik_llama.cpp (local)",
            "options": {
                "baseURL": "http://localhost:8080/v1",
                "timeout": 99999999999
            },
            "models": {
                "Kimi-K2.5-Q4_X": {
                  "name": "Kimi-K2.5-Q4_X",
                  "limit": { "context": 1000000, "output": 32000 },
                  "cost": { "input": 5.0, "output": 25.0 },
                  "temperature": true,
                  "reasoning": true,
                  "tool_call": true
                }
            }
        }
    }
}

The Kimi name there doesn't seem to matter it just uses whatever you load. I'm starting it up like so:

model=/mnt/raid/models/ubergarm/Qwen3.5-27B-GGUF/Qwen3.5-27B-smol-IQ4_NL.gguf
_GLIBCXX_REGEX_STATE_LIMIT=1000000 \
CUDA_VISIBLE_DEVICES="0,1" \
./build/bin/llama-server \
  --model "$model" \
  --alias Qwen3.5-27B \
  -c 262144 \
  -fa on \
  -ger \
  --merge-qkv \
  -sm graph \
  -ngl 99 \
  -ub 4096 -b 4096 \
  --threads 1 \
  --host 127.0.0.1 \
  --port 8080 \
  --parallel 1 \
  --jinja \
  --no-mmap

The files are not perfect but close enough that it seems to be working well for me building against tip of main today with latest chunked dedlta net fixes.

dinerburger

Feb 27

OK great, thank you so much for the lead on this. I've been using mainline, but obviously I prefer high-quality IK quants when I can get 'em! I'll run some tests on this tonight, and thanks again for being the lighthouse of the ik_lcpp community haha.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment