Phase 2B: Model architecture — DomainTransformerForCausalLM (NoPE, GPT-style), PLR embeddings, DCNv2 + JointFusion, 105 passing tests
Browse filesImplements the full model architecture following Nubank nuFormer patterns:
- configuration.py: DomainTransformerConfig with presets (24M/85M/330M)
- modeling.py: GPT-style causal decoder with NoPE, SDPA attention, pre-norm, weight tying
- plr_embeddings.py: PeriodicLinearReLU numerical embeddings (Gorishniy et al. 2022)
- joint_fusion.py: DCNv2 + PLR + Transformer joint fusion (nuFormer-style)
- test_model.py: 33 tests covering config, model, PLR, DCNv2, joint fusion, integration
- All 105 tests passing (72 tokenizer + 33 model)
src/domain_tokenizer/models/__init__.py
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Model components for domainTokenizer.
|
| 3 |
+
|
| 4 |
+
- DomainTransformerConfig: HF-compatible configuration
|
| 5 |
+
- DomainTransformerForCausalLM: GPT-style causal decoder (NoPE)
|
| 6 |
+
- PeriodicLinearReLU: PLR numerical embeddings (Gorishniy et al. 2022)
|
| 7 |
+
- JointFusionModel: nuFormer-style Transformer + DCNv2 fusion
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
from .configuration import DomainTransformerConfig
|
| 11 |
+
from .modeling import (
|
| 12 |
+
DomainTransformerPreTrainedModel,
|
| 13 |
+
DomainTransformerModel,
|
| 14 |
+
DomainTransformerForCausalLM,
|
| 15 |
+
DomainTransformerAttention,
|
| 16 |
+
DomainTransformerMLP,
|
| 17 |
+
DomainTransformerBlock,
|
| 18 |
+
)
|
| 19 |
+
from .plr_embeddings import PeriodicLinearReLU
|
| 20 |
+
from .joint_fusion import DCNv2CrossLayer, DCNv2, JointFusionModel
|