# Multistral Tokenizer Training completed successfully! ## Configuration - Vocabulary size: 127,989 - Special tokens: 13 - Min frequency: 32 - Training samples: up to 1,000,000 ## Dataset - Source: dataset/ ## Special Tokens <|begin|>, <|return|>, <|pad|>, <|start|>, <|channel|>, <|end|>, <|message|>, <|image|>, <|video|>, <|audio|>, <|call|>, <|constrain|>, <|unknown|> ## Enforced Vocabulary analysis, assistant, commentary, developer, final, json, system, tool, toon, user, yaml ## Usage ```python from multistral.multistraltokenizer import MultistralTokenizer tokenizer = MultistralTokenizer.from_pretrained("models/multistral-tokenizer") tokens = tokenizer.encode("Your text here") text = tokenizer.decode(tokens) ```