fix: set `clean_up_tokenization_spaces` to `false`
tokenizer_config.json has "clean_up_tokenization_spaces": true, which causes tokenizer.decode() to silently corrupt text. This affects every Llama 3.x model on the Hub and every fine-tune or downstream model that inherits their tokenizer config. Both Llama 2 and Llama 4 ship with false.
The fix is a one-line change: "clean_up_tokenization_spaces": true → "clean_up_tokenization_spaces": false.
What clean_up_tokenization_spaces does
When true, tokenizer.decode() strips spaces before punctuation marks during decoding. Specifically, it applies these string replacements to the decoded text:
text.replace(" .", ".").replace(" ?", "?").replace(" !", "!")
.replace(" ,", ",").replace(" ' ", "'")
.replace(" n't", "n't").replace(" 'm", "'m")
.replace(" 's", "'s").replace(" 've", "'ve").replace(" 're", "'re")
This was designed for BERT-era WordPiece tokenizers (2019) where decoding produced artifacts like "Hello , world .". Llama 3's BPE tokenizer encodes spaces as part of tokens and does not produce these artifacts. The cleanup is actively destructive — it strips legitimate spaces from the decoded text.
Minimal reproduction
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
text = "x != y and a.b == c"
ids = tokenizer.encode(text, add_special_tokens=False)
decoded = tokenizer.decode(ids)
print(repr(decoded))
decoded_fixed = tokenizer.decode(ids, clean_up_tokenization_spaces=False)
print(repr(decoded_fixed))
Output:
'x!= y and a.b == c' ← space before != silently dropped
'x != y and a.b == c' ← correct
Impact
- The bug is specific to HuggingFace's tokenizer implementation. Every other tokenizer — including Meta's own tiktoken, vLLM, SGLang, OLMo, etc. — does not have this behavior.
- Every model that uses a Llama 3 tokenizer from HuggingFace has been and is currently decoding wrong — not just the official
meta-llamarepos, but all fine-tunes and derivatives that inherited the tokenizer config.
How Llama 3 got clean_up_tokenization_spaces=True
This was never an intentional choice by Meta:
- Llama 2 explicitly set it to
FalseinLlamaTokenizer.__init__ - Llama 3 switched to
PreTrainedTokenizerFastvia a newLlama3Converter(PR #30334). The converter didn't passclean_up_tokenization_spaces, so it inherited the HuggingFace transformers library default ofTrue - The uploaded
tokenizer_config.jsonfiles on the Hub baked inTrue - PR #33778 (Llama 3.2 support, Oct 2024) then hardcoded
Truein the conversion script for backward compatibility — without discussion of whether the value was correct - The library default was changed to
Falsein Sep 2024 (PR #31938), but the Llama 3 configs already hadTruefrozen
This has been flagged multiple times:
- Discussion 44 on Meta-Llama-3-70B-Instruct (May 2024)
- transformers issue #35175
- transformers issue #31187
- transformers issue #32575
@ArthurZucker acknowledged in #35175: "It should be set to False!"
Both Llama 2 and Llama 4 ship with false, confirming this is recognized as a bug.
Affected models
All 21 Llama 3.x text model repos on the Hub have "clean_up_tokenization_spaces": true:
- Llama 3.0: Meta-Llama-3-8B, -8B-Instruct, -70B, -70B-Instruct
- Llama 3.1: Llama-3.1-8B, -8B-Instruct, -70B, -70B-Instruct, -405B, -405B-FP8, -405B-Instruct, -405B-Instruct-FP8
- Llama 3.2: Llama-3.2-1B, -1B-Instruct, -3B, -3B-Instruct
- Llama 3.3: Llama-3.3-70B-Instruct
Companion PRs have been opened on each of these repos. Downstream models (fine-tunes and derivatives) that inherited the tokenizer config are not covered by these PRs and will need to be fixed independently.
(removed due to broken links - see below)
(removed due to broken links - see below)
Companion PRs
The same one-line fix has been opened on all 24 meta-llama repos that have clean_up_tokenization_spaces=true in their tokenizer_config.json. Tested across every version of transformers from 4.40.0 (first Llama 3 support, April 2024) through 5.3.0 (latest, March 2026) — all produce incorrect decoded text.
Llama 3.0:
Llama 3.1:
- Llama-3.1-8B
- Llama-3.1-8B-Instruct — this PR
- Llama-3.1-70B
- Llama-3.1-70B-Instruct
- Llama-3.1-405B
- Llama-3.1-405B-FP8
- Llama-3.1-405B-Instruct
- Llama-3.1-405B-Instruct-FP8
Llama 3.2:
Llama 3.3:
Llama Guard:
Prompt Guard:
The remaining 46 meta-llama repos either have false already (Llama 4, Llama-Guard-4) or don't have their own tokenizer_config.json (CodeLlama, Llama 2, quantized/vision/Original-format variants). Downstream models (fine-tunes and derivatives) that inherited the tokenizer config are not covered by these PRs and will need to be fixed independently.
High-download descendant PRs
Surveyed the top Llama 3 derivative models by download count on the Hub. Opened fix PRs on the 13 highest-download non-meta-llama models that ship their own tokenizer_config.json with clean_up_tokenization_spaces=true. Together with the 24 official meta-llama PRs above, these cover ~90% of total downloads across all affected models found.
RedHatAI (quantizations):
- Llama-3.2-1B-Instruct-FP8-dynamic — 1.56M downloads
- Llama-3.2-1B-Instruct-FP8 — 836K
- Meta-Llama-3.1-8B-Instruct-FP8 — 531K
- Meta-Llama-3.1-8B-FP8 — 226K
AWQ quantizations:
unsloth (mirrors/quantizations):
- Meta-Llama-3.1-8B-Instruct — 381K
- Llama-3.1-8B-Instruct — 229K
Other:
- fixie-ai/ultravox-v0_5-llama-3_2-1b — 767K
- IlyaGusev/saiga_llama3_8b — 397K
- NousResearch/Hermes-3-Llama-3.1-8B — 382K
- nvidia/Llama-3.1-Nemotron-Nano-8B-v1 — 294K
- llamafactory/tiny-random-Llama-3 — 900K (test model)
Total PRs filed: 37 (24 official meta-llama + 13 high-download descendants). There are ~170 more affected models on the Hub with lower download counts not covered here.