Add logprobs workaround for harmony channel tokens

#3
No description provided.
kndtran changed pull request status to open
kndtran changed pull request title from Increase max tokens for answerability to Add logprobs workaround for harmony channel tokens

Summary

  • Enable logprobs_workaround: true in all 4 io.yaml files (citations, hallucination_detection, query_rewrite, answerability)
  • Increase max_completion_tokens for answerability to account for harmony channel token overhead

Details

gpt-oss models use the Harmony response format which wraps output in channel tokens (<|channel|>final<|message|>...<|end|>). When the inference server fails to strip these tokens from message.content, downstream JSON parsing breaks.

The logprobs_workaround flag (added in granite-common#127) derives model output content from the logprob token sequence instead of trusting message.content, since logprobs are the authoritative sequence the model actually produced.

frreiss changed pull request status to merged

Sign up or log in to comment