Add DomainTransformerForCausalLM — GPT-style NoPE model with SDPA attention, weight tying, HF Trainer compatible 0dec8e4 verified rtferraz commited on 9 days ago