Commit History

CRITICAL FIX: Switch from ByteLevel to Whitespace pre-tokenizer β€” fixes 42% UNK rate on domain token sequences
a9c4a62
verified

rtferraz commited on

Add domain_tokenizer.py β€” DomainTokenizerBuilder (core assembler, HF integration)
818a2e9
verified

rtferraz commited on

Add field_tokenizers.py β€” Sign, MagnitudeBucket, Calendar, Categorical, DiscreteNumerical tokenizers
511f3aa
verified

rtferraz commited on

Add tokenizers package init
0b06df3
verified

rtferraz commited on