Tool-call argument JSON malformations on long-content payloads (KIMI 2.6 on VLLM 0.19.1)

#31
by prakharprakash - opened

Reporting a reproducible failure mode in K2.6 tool-call generation for the Moonshot team.

###ENV
args: [ 'serve', 'moonshotai/Kimi-K2.6', '--host', '0.0.0.0', '--port', '8000', '--tensor-parallel-size', '8', '--max-model-len', '262144', '--kv-cache-dtype', 'auto', '--gpu-memory-utilization', '0.9', '--max-num-batched-tokens', '16384', '--chat-template', './chat_template.jinja', '--enable-prefix-caching', '--override-generation-config', '{"temperature": 1.0, "top_p": 0.95, "max_new_tokens": 131000, "repetition_penalty": 1.05}', '--served-model-name', '', '', '--decode-context-parallel-size', '8', '--mm-encoder-tp-mode', 'data', '--mm-processor-cache-type', 'shm', '--trust-remote-code', '--tokenizer-mode', 'hf', '--default-chat-template-kwargs', '{"thinking": true}', '--enable-chunked-prefill', '--max-num-seqs', '64', '--max-cudagraph-capture-size', '128', '--compilation-config', '{"cudagraph_mode": "FULL_AND_PIECEWISE"}', '--enable-auto-tool-choice', '--tool-call-parser', 'kimi_k2', '--reasoning-parser', 'kimi_k2', '--kv-cache-metrics', '--kv-cache-metrics-sample', '0.05' ]

What we observe

The model produces structurally-malformed JSON in <|tool_call_argument_begin|>...<|tool_call_end|>. Two patterns:

Pattern A β€” long string + missing outer close: create_google_doc-style tool calls with ~20-25KB of markdown content in a single string field consistently miss the final } of the outer object. Model emits the closing " of the string and the closing } of the inner object but then stops, treating the call as complete.

Concrete sample (truncated): {"title": "...", "content": "## Section 1\n\n- bullet\n- bullet\npython\ndef x():\n pass\n\n\n...7KB later...\n## Conclusion\n\nReady for review.\n" ^ EOF here, missing }

Pattern B β€” doubly-stringified JSON: Tools whose schemas accept a stringified-JSON input field (anti-pattern, but common in agent frameworks). Model produces {"input": "{\"k\": \"v\", \"k2\": ...}"} and at ~500-700 chars in, drops a delimiter inside the inner JSON. Outer parse succeeds up to that point, then fails.

We've also seen high rates of the model omitting required schema fields (e.g. dropping a required graph field on tools where path alone "feels sufficient" β€” but that's a separate report.

Happy to share more telemetry if useful.

Sign up or log in to comment