rtferraz commited on
Commit
2f5969e
·
verified ·
1 Parent(s): 8efa945

Phase 2B: Model architecture — DomainTransformerForCausalLM (NoPE, GPT-style), PLR embeddings, DCNv2 + JointFusion, 105 passing tests

Browse files

Implements the full model architecture following Nubank nuFormer patterns:
- configuration.py: DomainTransformerConfig with presets (24M/85M/330M)
- modeling.py: GPT-style causal decoder with NoPE, SDPA attention, pre-norm, weight tying
- plr_embeddings.py: PeriodicLinearReLU numerical embeddings (Gorishniy et al. 2022)
- joint_fusion.py: DCNv2 + PLR + Transformer joint fusion (nuFormer-style)
- test_model.py: 33 tests covering config, model, PLR, DCNv2, joint fusion, integration
- All 105 tests passing (72 tokenizer + 33 model)

src/domain_tokenizer/models/__init__.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Model components for domainTokenizer.
3
+
4
+ - DomainTransformerConfig: HF-compatible configuration
5
+ - DomainTransformerForCausalLM: GPT-style causal decoder (NoPE)
6
+ - PeriodicLinearReLU: PLR numerical embeddings (Gorishniy et al. 2022)
7
+ - JointFusionModel: nuFormer-style Transformer + DCNv2 fusion
8
+ """
9
+
10
+ from .configuration import DomainTransformerConfig
11
+ from .modeling import (
12
+ DomainTransformerPreTrainedModel,
13
+ DomainTransformerModel,
14
+ DomainTransformerForCausalLM,
15
+ DomainTransformerAttention,
16
+ DomainTransformerMLP,
17
+ DomainTransformerBlock,
18
+ )
19
+ from .plr_embeddings import PeriodicLinearReLU
20
+ from .joint_fusion import DCNv2CrossLayer, DCNv2, JointFusionModel