CRITICAL FIX: Switch from ByteLevel to Whitespace pre-tokenizer β fixes 42% UNK rate on domain token sequences a9c4a62 verified rtferraz commited on 3 days ago
Add domain_tokenizer.py β DomainTokenizerBuilder (core assembler, HF integration) 818a2e9 verified rtferraz commited on 9 days ago
Add field_tokenizers.py β Sign, MagnitudeBucket, Calendar, Categorical, DiscreteNumerical tokenizers 511f3aa verified rtferraz commited on 9 days ago