# Multistral Tokenizer

Training completed successfully!

## Configuration
- Vocabulary size: 127,989
- Special tokens: 13
- Min frequency: 32
- Training samples: up to 1,000,000

## Dataset
- Source: dataset/

## Special Tokens
<|begin|>, <|return|>, <|pad|>, <|start|>, <|channel|>, <|end|>, <|message|>, <|image|>, <|video|>, <|audio|>, <|call|>, <|constrain|>, <|unknown|>

## Enforced Vocabulary
analysis, assistant, commentary, developer, final, json, system, tool, toon, user, yaml

## Usage

```python
from multistral.multistraltokenizer import MultistralTokenizer

tokenizer = MultistralTokenizer.from_pretrained("models/multistral-tokenizer")
tokens = tokenizer.encode("Your text here")
text = tokenizer.decode(tokens)
```