A newer version of this model is available: ethicalabs/Echo-DSRN-114M-v0.1.2-Base

Model Card for ethicalabs/Echo-DSRN-114M-Base

The Echo-DSRN(N) (Dual State Recurrent Neural Network, short name: Echo-DSRN, also know as echo) is a novel architecture specifically designed to be a viable alternative for low-resource tasks that are currently being inefficiently handled by the excessive scale of Large Language Models (LLMs) 🌱

⚠️ Important Notice

This is a research prototype and demo model.

  • Not production-ready
  • Will hallucinate and give incorrect answers
  • Do not use for any real-world decisions
  • Intended for architecture experimentation only

What Works

  • Text generation is fluent
  • Memory usage is constant O(1)
  • Runs on CPUs, NPUs, GPUs (Tested on AMD's ROCm and Apple's MPS)

What Doesn't Work

  • Factual accuracy
  • Instruction following
  • Common sense reasoning

πŸ—οΈ Architecture Details

Property Value
Model Type echo_dsrn
Layers 8
Hidden Dim 512
Attention Heads 4
MLP Ratio 8.0
Vocab Size 32011
Hybrid Attention True
RMSNorm True

πŸ“Š Parameter Breakdown

Component Parameters % of Total
Total 114.69M (114,687,488) 100%
Embeddings 16.39M 14.29%
DSRN Blocks (Aggregate) 81.91M 71.42%
LM Head 16.39M 14.29%

🧩 Internal Block Structure (Per Layer)

Sub-Component Parameters Description
MLP (Feed-Forward) 4.20M Upscaled hidden layers
DSRN Slow State 3.15M Constant-time memory gates
GRU Fast State 1.58M Recurrent fast path
Surprise Gating 264,192 Dynamic focus mechanism
Normalization 1,024 LayerNorm / RMSNorm

Pre-Training

Truncated Backpropagation Through Time (TBPTT) on Fineweb-EDU (10BT)

1 epoch on a single AMD Instinct MI300X 192 GB

image

Continued Pre-Training (SFTTrainer)

1 epoch on a single AMD Radeon AI PRO R9700 32 GB

image image

Evaluation

uv run lm_eval --model hf   --model_args pretrained=ethicalabs/Echo-DSRN-114M-Base,trust_remote_code=True,device_map="auto"   --tasks sciq,piqa --output_path ./results_Echo-DSRN-114M-Base --batch_size 16 --num_fewshot 5 && uv run lm_eval --model hf   --model_args pretrained=ethicalabs/Echo-DSRN-114M-Base,trust_remote_code=True,device_map="auto"   --tasks sciq,piqa --output_path ./results_Echo-DSRN-114M-Base --batch_size 16 --num_fewshot 5
Tasks Version Filter n-shot Metric Value Stderr
piqa 1 none 5 acc ↑ 0.6055 Β± 0.0114
none 5 acc_norm ↑ 0.6012 Β± 0.0114
sciq 1 none 5 acc ↑ 0.6200 Β± 0.0154
none 5 acc_norm ↑ 0.5480 Β± 0.0157
uv run lm_eval --model hf   --model_args pretrained=ethicalabs/Echo-DSRN-114M-Base,trust_remote_code=True,device_map="auto"   --tasks sciq,piqa --output_path ./results_Echo-DSRN-114M-Base --batch_size 16 --num_fewshot 5 && uv run lm_eval --model hf   --model_args pretrained=ethicalabs/Echo-DSRN-114M-Base,trust_remote_code=True,device_map="auto"   --tasks sciq,piqa --output_path ./results_Echo-DSRN-114M-Base --batch_size 16 --num_fewshot 10
Tasks Version Filter n-shot Metric Value Stderr
piqa 1 none 10 acc ↑ 0.6083 Β± 0.0114
none 10 acc_norm ↑ 0.6066 Β± 0.0114
sciq 1 none 10 acc ↑ 0.6150 Β± 0.0154
none 10 acc_norm ↑ 0.5600 Β± 0.0157

Gradio App - Next Word Prediction

Echo-DSRN-114M-Base - Next Word Prediction

Downloads last month
1,032
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ethicalabs/Echo-DSRN-114M-Base

Finetunes
2 models

Datasets used to train ethicalabs/Echo-DSRN-114M-Base

Space using ethicalabs/Echo-DSRN-114M-Base 1