Beyond Linear Neurons: RichNeuron Benchmark

Can replacing y = ReLU(Wx + b) with richer per-neuron computation store more information per parameter?

Answer: Yes — up to 168,327× more at matched parameter budgets. But every adaptive mechanism we tried failed at small scale.

The Core Finding

We replaced the standard neural network neuron with 15 progressively more complex architectures. At matched parameter budgets (3K-8K params):

SinGLU (sin(ω·W₁x) ⊙ W₂x) — a multiplicative periodic neuron inspired by the GLU literature — wins 5/6 standard tasks against vanilla ReLU MLPs
168,327× lower MSE on a 200-point memorization task
222× lower MSE on nested multiplicative periodic functions
+35.9 percentage points on checkerboard classification

But every adaptive mechanism failed — routing gates, learnable frequencies, phase predictors, and multi-scale decompositions all either matched or underperformed SinGLU at this scale.

Architecture Search (15 Versions)

Version	Hypothesis	Key Equation	Verdict
B1: Vanilla	Baseline	`ReLU(Wx + b)`	OOD-robust but limited expressivity
B2: SinGLU	GLU-style periodic	`sin(ω·Wg·x) ⊙ Wv·x`	Best overall — wins 5/6 tasks
v1	Multiplicative + periodic + residual	`(W₁x) ⊙ sin(ω·W₂x) + W₁x`	Killed by width penalty
v4	Width-fix strategies	Low-rank, shared-weight, GLU-style	SinGLU emerges as best width fix
v6	Adaptive routing gate	`α(x)·periodic + (1-α)·linear`	α stuck at 0.5 — never learns
v7	Learnable frequency	`sin(ω(x)·Wx)`	ω frozen at initialization
v8	Phase + amplitude gate	`sin(ω·Wx + φ(x))` with sigmoid	Gate weak, phase underused
v9	Controlled freq + phase + gate	Bounded ω(x) + φ(x) + α(x)	5 matrices → too narrow
v10	Free phase	`sin(ω·Wg·x + π·tanh(Wφ·x))`	Best adaptive — wins Spiral + Complex
v11	Disciplined phase (scaled)	`sin(ω·(g + 0.1·tanh(φ)))`	Phase ~0 — basically SinGLU
v12	Signal-proportional (FM)	`sin(ω·g·(1 + 0.2·tanh(φ)))`	Frequency modulation, not phase
v13	Signal-aligned phase	`sin(ω·g + 0.1·g·tanh(φ))`	Wins Checker, kills Spiral
v15	Dual-phase decomposition	`sin(ωg+βφ) ⊙ (1+α·sin(2ωg+γφ))`	First to beat SinGLU on HiFreq

Complete Results (All Versions × 9 Tasks)

Regression (MSE ↓)

Model	Complex (4D)	Nested (2D)	HiFreq	Memorize
Vanilla	0.0575	0.0487	1.10	0.1568
SinGLU	0.0143	0.0002	1.02	9.3e-7
v10	0.0080	0.0004	1.22	1.7e-5
v15	0.0316	0.0051	0.85	1.0e-5

Classification (Accuracy ↑)

Model	Spiral	Checkerboard
Vanilla	85.1%	57.9%
SinGLU	44.2%	93.8%
v10	99.2%	93.8%
v15	98.9%	90.0%

Generalization (MSE ↓)

Model	OOD [-1,1]→[1,2]	Freq Gen 2π→10π	Mixed Freq 4π→20π
Vanilla	1.53	1.172	1.329
SinGLU	5.90	0.736	1.491
v10	4.96	0.958	1.178
v15	4.38	0.910	1.317

The Six Biggest Findings

SinGLU stores 168,327× more information per parameter on memorization tasks
SinGLU wins 5/6 standard tasks — but no universal best neuron exists (Spiral needs v10's free phase)
Every adaptive mechanism fails at small scale due to gradient competition between meta-parameters and direct computation
The width-richness tradeoff is severe — every extra matrix steals ~35% of hidden width
Fixed-frequency generalizes better than adaptive — SinGLU trains worse on sin(2πx) but tests better on sin(10πx) (unseen frequency)
Periodic neurons hallucinate on OOD — all sinusoidal architectures show 24-1273× degradation on distribution shift vs Vanilla's 7×

Task-Specific Regime Map

Task Type	Best Architecture	Why
Compositional / multiplicative	SinGLU	Cross-terms match function structure
Geometric / rotational (spirals)	v10 (free phase)	Phase shifts rotate boundaries
Multi-scale frequencies (audio)	v15 (dual-phase)	Explicit ω + 2ω channels
Pure memorization	SinGLU	Maximum information density
OOD / distribution shift	Vanilla ReLU	No hallucinated oscillations
Unseen frequency generalization	SinGLU	Fixed basis transfers across scales

Repository Files

Benchmarks (one per version)

File	Contains
`benchmark.py`	v1 — original RichNeuron vs Vanilla
`benchmark_v4.py`	v4 — width-fix strategies (LowRank, Shared, SinGLU)
`benchmark_v5.py`	v5 — honest re-eval (3 seeds, gradient norms, OOD)
`benchmark_v6.py`	v6 — adaptive routing neuron
`benchmark_v7.py`	v7 — learnable frequency neuron
`benchmark_v8.py`	v8 — adaptive phase + amplitude gate
`benchmark_v9.py`	v9 — controlled freq + phase + gate
`benchmark_v10.py`	v10 — SinGLU + free phase
`benchmark_v11.py`	v11 — SinGLU + disciplined phase
`benchmark_v12.py`	v12 — SinGLU + signal-proportional phase (FM)
`benchmark_v13.py`	v13 — SinGLU + aligned phase + correlation analysis
`benchmark_v15.py`	v15 — dual-phase decomposition + killer experiments

Results & Reports

File	Contains
`results.json`	v1 raw results
`results_v4.json` — `results_v15.json`	Per-version raw JSON with per-seed scores
`PAPER.md`	Full technical report with analysis
`FINDINGS_SUMMARY.md`	Complete architecture catalog and all results tables
`CORRECTIONS.md`	Data verification and audit trail

Quick Start

pip install torch numpy
python benchmark_v10.py   # Run the best adaptive variant
python benchmark_v15.py   # Run dual-phase + killer experiments

All benchmarks use CPU-only PyTorch and complete in ~15 minutes each.

Reproducibility

Hardware: CPU-only (2 vCPU, 8GB RAM)
Total runtime: ~4 hours for all 12 benchmarks
Framework: PyTorch
Seeds: 3 random seeds per experiment
Statistical significance: Mean ± std reported; all claims based on consistent seed-wise ordering
Parameter matching: Binary search over hidden dimensions to match budgets within ~5%

Citation

@misc{richneuron2025,
  title={Beyond Linear Neurons: An Empirical Study of Multiplicative Periodic 
         Architectures at Small Scale},
  author={anshdadhich},
  year={2025},
  url={https://huggingface.co/anshdadhich/richneuron-vs-vanilla-benchmark}
}

References

KAN: Kolmogorov-Arnold Networks — learnable spline activations
MONet: Multilinear Operator Networks — multiplicative interactions
SIREN — periodic activation functions
GLU Variants Improve Transformer — SwiGLU / the 2/3 width trick
Sine-Activated Low-Rank Matrices — sin() provably increases rank
Knowledge Capacity Scaling Laws — ~2 bits/param for standard transformers
Densing Law of LLMs — capability density trends

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for anshdadhich/richneuron-vs-vanilla-benchmark