Commit History

CRITICAL FIX: Switch from ByteLevel to Whitespace pre-tokenizer — fixes 42% UNK rate on domain token sequences
a9c4a62
verified

rtferraz commited on

Add domain_tokenizer.py — DomainTokenizerBuilder (core assembler, HF integration)
818a2e9
verified

rtferraz commited on