Echo-DSRN
Collection
Dual-State Recurrent Neural Network (Transformers Hybrid) β’ 7 items β’ Updated
The Echo-DSRN(N) (Dual State Recurrent Neural Network, short name: Echo-DSRN, also know as echo) is a novel architecture specifically designed to be a viable alternative for low-resource tasks that are currently being inefficiently handled by the excessive scale of Large Language Models (LLMs) π±
This is a research prototype and demo model.
| Property | Value |
|---|---|
| Model Type | echo_dsrn |
| Layers | 8 |
| Hidden Dim | 512 |
| Attention Heads | 4 |
| MLP Ratio | 8.0 |
| Vocab Size | 32011 |
| Hybrid Attention | True |
| RMSNorm | True |
| Component | Parameters | % of Total |
|---|---|---|
| Total | 114.69M (114,687,488) | 100% |
| Embeddings | 16.39M | 14.29% |
| DSRN Blocks (Aggregate) | 81.91M | 71.42% |
| LM Head | 16.39M | 14.29% |
| Sub-Component | Parameters | Description |
|---|---|---|
| MLP (Feed-Forward) | 4.20M | Upscaled hidden layers |
| DSRN Slow State | 3.15M | Constant-time memory gates |
| GRU Fast State | 1.58M | Recurrent fast path |
| Surprise Gating | 264,192 | Dynamic focus mechanism |
| Normalization | 1,024 | LayerNorm / RMSNorm |
1 epoch on a single AMD Instinct MI300X 192 GB
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| piqa | 1 | none | 0 | acc | β | 0.5789 | Β± | 0.0115 |
| none | 0 | acc_norm | β | 0.5718 | Β± | 0.0115 | ||
| sciq | 1 | none | 0 | acc | β | 0.5830 | Β± | 0.0156 |
| none | 0 | acc_norm | β | 0.5250 | Β± | 0.0158 |
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| piqa | 1 | none | 5 | acc | β | 0.5773 | Β± | 0.0115 |
| none | 5 | acc_norm | β | 0.5729 | Β± | 0.0115 | ||
| sciq | 1 | none | 5 | acc | β | 0.5700 | Β± | 0.0157 |
| none | 5 | acc_norm | β | 0.5140 | Β± | 0.0158 |