Why vocab size is 32032? (Extra 30 tokens)
#3
by theodotus - opened
Is it a typo?
Should be 32002
theodotus changed discussion title from Why vocab size is 32032 to Why vocab size is 32032? (Extra 30 tokens)
We use ChatML format, so we have to add 2 extra tokens to the tokenizer:
https://huggingface.co/wandb/mistral-7b-zephyr-dpo/blob/main/tokenizer_config.json
tcapelle changed discussion status to closed