Allucination parsing JSON responses containing numerical data such as 1234.00

#10
by jeisonvendetta - opened

While testing the model, I've noticed that it has trouble parsing JSON responses containing numerical data such as
1234.00, but only when those data is in tool_response

Hey @jeisonvendetta ,

Thank you for reporting this issue. Floating point numbers with trailing zeros can sometimes behave unpredictably, but it's unusual for this to occur strictly within a tool_response block. This might be related either to tokenisation quirks or how the reasoning parser handles JSON data.

To help us reproduce and diagnose the behavior, could you please provide a bit more detail about your setup? Specifically it would be helpful to know:

  1. The exact system prompt and tool schema you are using.
  2. The raw tool_response string being injected back into the context.
  3. The inference framework you’re using (Hugging Face Transformers, LiteRT, vLLM).
  4. Your generation parameters, such as temperature, top-p, and whether skip_special_tokens is enabled.

With these details, we can run a reproduction on our side and identify whether this is a model side issue or an artifact of JSON parsing.

I wasn't able to extract the conversation from the conversation flow.
But I've already found the cause:
Most output errors occur on the GPU when Think is enabled. The model generates tokens like
1.0,293 or 5,90000 and wrong dates like 202026 20333

But this behavior doesn't occur on the CPU. I suppose it's due to quantization.

We are currently using the Litert-LM with the Conversation API.

We have pushed the model to its maximum capacity on the CPU using chained skill instructions and instructing the model to think harder, and it’s actually performing very well, just impressive.

In this video, we ask the model to load a skill that contains instructions for using tools from other skills.

First video.

  • get historical data (OHLCV data)
  • web search (Tavily)
  • file manager (a custom command-line tool that allows you to use grep, regex, and edit lines in files, as well as read, write, and modify them)

Second video.
We ask it to do exactly the same thing, but in a different language and using a different set of symbols... but with temperature set to 0 and topK set to 1. At these values, it performs better and follows instructions without thinking, which takes less time.

If we could achieve the same level of precision on the GPU as on the CPU, we could do much more with more complex instructions.

jeisonvendetta changed discussion status to closed
jeisonvendetta changed discussion status to open

Sign up or log in to comment