Model Card for ethicalabs/Echo-DSRN-114M-v0.1.2-Base

The Echo-DSRN(N) (Dual State Recurrent Neural Network, short name: Echo-DSRN, also know as echo) is a novel architecture specifically designed to be a viable alternative for low-resource tasks that are currently being inefficiently handled by the excessive scale of Large Language Models (LLMs) 🌱

⚠️ Important Notice

This is a research prototype and demo model.

  • Not production-ready
  • Will hallucinate and give incorrect answers
  • Do not use for any real-world decisions
  • Intended for architecture experimentation only

What Works

  • Text generation is fluent
  • Memory usage is constant O(1)
  • Runs on CPUs, NPUs, GPUs (Tested on AMD's ROCm and Apple's MPS)

What Doesn't Work

  • Factual accuracy
  • Instruction following
  • Common sense reasoning

πŸ—οΈ Architecture Details

Property Value
Model Type echo_dsrn
Layers 8
Hidden Dim 512
Attention Heads 4
MLP Ratio 8.0
Vocab Size 32011
Hybrid Attention True
RMSNorm True

πŸ“Š Parameter Breakdown

Component Parameters % of Total
Total 114.69M (114,687,488) 100%
Embeddings 16.39M 14.29%
DSRN Blocks (Aggregate) 81.91M 71.42%
LM Head 16.39M 14.29%

🧩 Internal Block Structure (Per Layer)

Sub-Component Parameters Description
MLP (Feed-Forward) 4.20M Upscaled hidden layers
DSRN Slow State 3.15M Constant-time memory gates
GRU Fast State 1.58M Recurrent fast path
Surprise Gating 264,192 Dynamic focus mechanism
Normalization 1,024 LayerNorm / RMSNorm

Pre-Training

Truncated Backpropagation Through Time (TBPTT) on Fineweb-EDU + Smoltalk2 (no think) (6BT)

1 epoch on a single AMD Instinct MI300X 192 GB

image

Evaluation

Tasks Version Filter n-shot Metric Value Stderr
piqa 1 none 0 acc ↑ 0.5789 Β± 0.0115
none 0 acc_norm ↑ 0.5718 Β± 0.0115
sciq 1 none 0 acc ↑ 0.5830 Β± 0.0156
none 0 acc_norm ↑ 0.5250 Β± 0.0158
Tasks Version Filter n-shot Metric Value Stderr
piqa 1 none 5 acc ↑ 0.5773 Β± 0.0115
none 5 acc_norm ↑ 0.5729 Β± 0.0115
sciq 1 none 5 acc ↑ 0.5700 Β± 0.0157
none 5 acc_norm ↑ 0.5140 Β± 0.0158

Gradio App - Next Word Prediction

Echo-DSRN-114M-Base - Next Word Prediction

Downloads last month
1,036
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ethicalabs/Echo-DSRN-114M-v0.1.2-Base

Finetunes
1 model

Datasets used to train ethicalabs/Echo-DSRN-114M-v0.1.2-Base

Collection including ethicalabs/Echo-DSRN-114M-v0.1.2-Base