Add logprobs workaround for harmony channel tokens
#3
by kndtran - opened
No description provided.
kndtran changed pull request status to open
kndtran changed pull request title from Increase max tokens for answerability to Add logprobs workaround for harmony channel tokens
Summary
- Enable
logprobs_workaround: truein all 4 io.yaml files (citations, hallucination_detection, query_rewrite, answerability) - Increase
max_completion_tokensfor answerability to account for harmony channel token overhead
Details
gpt-oss models use the Harmony response format which wraps output in channel tokens (<|channel|>final<|message|>...<|end|>). When the inference server fails to strip these tokens from message.content, downstream JSON parsing breaks.
The logprobs_workaround flag (added in granite-common#127) derives model output content from the logprob token sequence instead of trusting message.content, since logprobs are the authoritative sequence the model actually produced.
frreiss changed pull request status to merged