Chat template differs from OpenAI's. Is it expected?

by gentry1337 - opened Aug 28, 2025

Aug 28, 2025

Repro:

from transformers import AutoTokenizer

chat = [
    {"role": "system", "content": "Let's sing!"},
    {"role": "user", "content": "Because maybe"},
    {"role": "assistant", "content": "You're gonna be the one that saves me"},
    {"role": "user", "content": "And after all"},
    {"role": "assistant", "content": "You're my wonderwall"},
]

openai_tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-20b")
unsloth_tokenizer = AutoTokenizer.from_pretrained("unsloth/gpt-oss-20b-BF16")

openai_result = openai_tokenizer.apply_chat_template(chat, tokenize=False)
unsloth_result = unsloth_tokenizer.apply_chat_template(chat, tokenize=False)
if openai_result != unsloth_result:
    print("Results differ:")
    print("OpenAI result: ", openai_result)
    print("Unsloth result: ", unsloth_result)

Output:

Results differ:
OpenAI result:  <|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-28

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>developer<|message|># Instructions

Let's sing!

<|end|><|start|>user<|message|>Because maybe<|end|><|start|>assistant<|channel|>final<|message|>You're gonna be the one that saves me<|end|><|start|>user<|message|>And after all<|end|><|start|>assistant<|channel|>final<|message|>You're my wonderwall<|return|>
Unsloth result:  <|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-28

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions

Let's sing!<|end|><|start|>user<|message|>Because maybe<|end|><|start|>assistant<|message|>You're gonna be the one that saves me<|end|><|start|>user<|message|>And after all<|end|><|start|>assistant<|message|>You're my wonderwall<|return|>

gentry1337

Aug 28, 2025

I've also run this test for unsloth 20b vs unsloth 120b and openai 20b vs openai 120b - unsloth tokenization even differs between 20B and 120B tokenizers while OpenAI's matches between different model sizes.

gentry1337

Aug 28, 2025

•

edited Aug 28, 2025

Looks like I've found some explanations of difference of chat templates:
https://docs.unsloth.ai/basics/gpt-oss-how-to-run-and-fine-tune#unsloth-fixes-for-gpt-oss
While this indeed explains the difference between OpenAI's chat template and Unsloth's one, it doesn't explain the difference between Unsloth 20B and Unsloth 120B GPT-OSS:

Results differ:
Unsloth 20B result:  <|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-28

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions

Let's sing!<|end|><|start|>user<|message|>Because maybe<|end|><|start|>assistant<|message|>You're gonna be the one that saves me<|end|><|start|>user<|message|>And after all<|end|><|start|>assistant<|message|>You're my wonderwall<|return|>
Unsloth 120B result:  <|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-28

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>developer<|message|># Instructions

Let's sing!<|end|><|start|>user<|message|>Because maybe<|end|><|start|>assistant<|channel|>final<|message|>You're gonna be the one that saves me<|end|><|start|>user<|message|>And after all<|end|><|start|>assistant<|channel|>final<|message|>You're my wonderwall<|return|>

It seems like GPT-OSS 120B misses Calls to these tools must go to the commentary channel: instruction

GauravEA

Jan 29

I am currently fine-tuning the GPT-OSS 20B model using Unsloth with HuggingFace TRL (SFTTrainer).

Long-term goal

Serve the model in production using Triton with either vLLM or TensorRT-LLM as the backend

Short-term / initial deployment using Ollama (GGUF)

Current challenge
GPT-OSS uses a Harmony-style chat template, which includes:

developer role

Explicit EOS handling

thinking / analysis channels

Tool / function calling structure

When converting the fine-tuned model to GGUF and deploying it in Ollama using the default GPT-OSS Modelfile, I am running into ambiguity around:

Whether the default Jinja chat template provided by GPT-OSS should be modified for Ollama compatibility

How to correctly handle:

EOS token behavior

Internal reasoning / analysis channels

Developer role alignment

How to do this without degrading the model’s default performance or alignment

Constraints / Intent

I already have training data prepared strictly in system / user / assistant format

I want to:

Preserve GPT-OSS’s native behavior as much as possible

Perform accurate, non-destructive fine-tuning

Avoid hacks that work short-term but break compatibility with vLLM / TensorRT-LLM later

What I’m looking for

Has anyone successfully:

Fine-tuned GPT-OSS

Converted it to GGUF

Deployed it with Ollama

While preserving the Harmony template behavior?

If yes:

Did you modify the chat template / Modelfile?

How did you handle EOS + reasoning channels?

Any pitfalls to avoid to keep it production-ready for Triton later?

Any concrete guidance, references, or proven setups would be extremely helpful.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment