| --- |
| license: apache-2.0 |
| language: |
| - en |
| base_model: |
| - darwinkernelpanic/DiffReaper-3 |
| --- |
| # DiffReaper-Talk |
|
|
| A 1.5B parameter Discrete Diffusion Language Model (dLLM) optimized for parallel token prediction. Trained during foundational pre-training phase on general text corpora. |
|
|
| ## Summary |
| DiffReaper-Talk uses a Transformer-based discrete diffusion architecture to predict multiple tokens in parallel. This approach avoids the sequential bottleneck of standard autoregressive generation. |
|
|
| ## Technical Details |
| - **Architecture:** 24-Layer Transformer Encoder |
| - **Embedding Dim:** 2048 |
| - **Heads:** 16 |
| - **Parameters:** ~1.5 Billion |
| - **Hardware:** 1x NVIDIA A100 (80GB VRAM) |
| - **Objective:** Markovian Discrete Denoising (Continuous Embedding Space) |
| - **Precision:** Mixed BF16 |
| - **Context Window:** 1024 Tokens |
|
|
| ## Current Status |
| Phase 2 (Logic) Complete. Logic and domain-specific training (Code) to be applied post-convergence. |