A newer version of this model is available: ethicalabs/Echo-DSRN-114M-v0.1.2-Base

Model Card for ethicalabs/Echo-DSRN-114M-Base

The Echo-DSRN(N) (Dual State Recurrent Neural Network, short name: Echo-DSRN, also know as echo) is a novel architecture specifically designed to be a viable alternative for low-resource tasks that are currently being inefficiently handled by the excessive scale of Large Language Models (LLMs) 🌱

⚠️ Important Notice

This is a research prototype and demo model.

Not production-ready
Will hallucinate and give incorrect answers
Do not use for any real-world decisions
Intended for architecture experimentation only

What Works

Text generation is fluent
Memory usage is constant O(1)
Runs on CPUs, NPUs, GPUs (Tested on AMD's ROCm and Apple's MPS)

What Doesn't Work

Factual accuracy
Instruction following
Common sense reasoning

🏗️ Architecture Details

Property	Value
Model Type	echo_dsrn
Layers	8
Hidden Dim	512
Attention Heads	4
MLP Ratio	8.0
Vocab Size	32011
Hybrid Attention	True
RMSNorm	True

📊 Parameter Breakdown

Component	Parameters	% of Total
Total	114.69M (114,687,488)	100%
Embeddings	16.39M	14.29%
DSRN Blocks (Aggregate)	81.91M	71.42%
LM Head	16.39M	14.29%

🧩 Internal Block Structure (Per Layer)

Sub-Component	Parameters	Description
MLP (Feed-Forward)	4.20M	Upscaled hidden layers
DSRN Slow State	3.15M	Constant-time memory gates
GRU Fast State	1.58M	Recurrent fast path
Surprise Gating	264,192	Dynamic focus mechanism
Normalization	1,024	LayerNorm / RMSNorm

Pre-Training

Truncated Backpropagation Through Time (TBPTT) on Fineweb-EDU (10BT)

1 epoch on a single AMD Instinct MI300X 192 GB

Continued Pre-Training (SFTTrainer)

1 epoch on a single AMD Radeon AI PRO R9700 32 GB

Evaluation

uv run lm_eval --model hf   --model_args pretrained=ethicalabs/Echo-DSRN-114M-Base,trust_remote_code=True,device_map="auto"   --tasks sciq,piqa --output_path ./results_Echo-DSRN-114M-Base --batch_size 16 --num_fewshot 5 && uv run lm_eval --model hf   --model_args pretrained=ethicalabs/Echo-DSRN-114M-Base,trust_remote_code=True,device_map="auto"   --tasks sciq,piqa --output_path ./results_Echo-DSRN-114M-Base --batch_size 16 --num_fewshot 5

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
piqa	1	none	5	acc	↑	0.6055	±	0.0114
		none	5	acc_norm	↑	0.6012	±	0.0114
sciq	1	none	5	acc	↑	0.6200	±	0.0154
		none	5	acc_norm	↑	0.5480	±	0.0157

uv run lm_eval --model hf   --model_args pretrained=ethicalabs/Echo-DSRN-114M-Base,trust_remote_code=True,device_map="auto"   --tasks sciq,piqa --output_path ./results_Echo-DSRN-114M-Base --batch_size 16 --num_fewshot 5 && uv run lm_eval --model hf   --model_args pretrained=ethicalabs/Echo-DSRN-114M-Base,trust_remote_code=True,device_map="auto"   --tasks sciq,piqa --output_path ./results_Echo-DSRN-114M-Base --batch_size 16 --num_fewshot 10

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
piqa	1	none	10	acc	↑	0.6083	±	0.0114
		none	10	acc_norm	↑	0.6066	±	0.0114
sciq	1	none	10	acc	↑	0.6150	±	0.0154
		none	10	acc_norm	↑	0.5600	±	0.0157

Gradio App - Next Word Prediction

Echo-DSRN-114M-Base - Next Word Prediction

Downloads last month: 1,032

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for ethicalabs/Echo-DSRN-114M-Base

Finetunes

2 models

ethicalabs
/

Echo-DSRN-114M-Base