YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Beyond Linear Neurons: RichNeuron Benchmark

Can replacing y = ReLU(Wx + b) with richer per-neuron computation store more information per parameter?

Answer: Yes — up to 168,327× more at matched parameter budgets. But every adaptive mechanism we tried failed at small scale.


The Core Finding

We replaced the standard neural network neuron with 15 progressively more complex architectures. At matched parameter budgets (3K-8K params):

  • SinGLU (sin(ω·W₁x) ⊙ W₂x) — a multiplicative periodic neuron inspired by the GLU literature — wins 5/6 standard tasks against vanilla ReLU MLPs
  • 168,327× lower MSE on a 200-point memorization task
  • 222× lower MSE on nested multiplicative periodic functions
  • +35.9 percentage points on checkerboard classification

But every adaptive mechanism failed — routing gates, learnable frequencies, phase predictors, and multi-scale decompositions all either matched or underperformed SinGLU at this scale.


Architecture Search (15 Versions)

Version Hypothesis Key Equation Verdict
B1: Vanilla Baseline ReLU(Wx + b) OOD-robust but limited expressivity
B2: SinGLU GLU-style periodic sin(ω·Wg·x) ⊙ Wv·x Best overall — wins 5/6 tasks
v1 Multiplicative + periodic + residual (W₁x) ⊙ sin(ω·W₂x) + W₁x Killed by width penalty
v4 Width-fix strategies Low-rank, shared-weight, GLU-style SinGLU emerges as best width fix
v6 Adaptive routing gate α(x)·periodic + (1-α)·linear α stuck at 0.5 — never learns
v7 Learnable frequency sin(ω(x)·Wx) ω frozen at initialization
v8 Phase + amplitude gate sin(ω·Wx + φ(x)) with sigmoid Gate weak, phase underused
v9 Controlled freq + phase + gate Bounded ω(x) + φ(x) + α(x) 5 matrices → too narrow
v10 Free phase sin(ω·Wg·x + π·tanh(Wφ·x)) Best adaptive — wins Spiral + Complex
v11 Disciplined phase (scaled) sin(ω·(g + 0.1·tanh(φ))) Phase ~0 — basically SinGLU
v12 Signal-proportional (FM) sin(ω·g·(1 + 0.2·tanh(φ))) Frequency modulation, not phase
v13 Signal-aligned phase sin(ω·g + 0.1·g·tanh(φ)) Wins Checker, kills Spiral
v15 Dual-phase decomposition sin(ωg+βφ) ⊙ (1+α·sin(2ωg+γφ)) First to beat SinGLU on HiFreq

Complete Results (All Versions × 9 Tasks)

Regression (MSE ↓)

Model Complex (4D) Nested (2D) HiFreq Memorize
Vanilla 0.0575 0.0487 1.10 0.1568
SinGLU 0.0143 0.0002 1.02 9.3e-7
v10 0.0080 0.0004 1.22 1.7e-5
v15 0.0316 0.0051 0.85 1.0e-5

Classification (Accuracy ↑)

Model Spiral Checkerboard
Vanilla 85.1% 57.9%
SinGLU 44.2% 93.8%
v10 99.2% 93.8%
v15 98.9% 90.0%

Generalization (MSE ↓)

Model OOD [-1,1]→[1,2] Freq Gen 2π→10π Mixed Freq 4π→20π
Vanilla 1.53 1.172 1.329
SinGLU 5.90 0.736 1.491
v10 4.96 0.958 1.178
v15 4.38 0.910 1.317

The Six Biggest Findings

  1. SinGLU stores 168,327× more information per parameter on memorization tasks
  2. SinGLU wins 5/6 standard tasks — but no universal best neuron exists (Spiral needs v10's free phase)
  3. Every adaptive mechanism fails at small scale due to gradient competition between meta-parameters and direct computation
  4. The width-richness tradeoff is severe — every extra matrix steals ~35% of hidden width
  5. Fixed-frequency generalizes better than adaptive — SinGLU trains worse on sin(2πx) but tests better on sin(10πx) (unseen frequency)
  6. Periodic neurons hallucinate on OOD — all sinusoidal architectures show 24-1273× degradation on distribution shift vs Vanilla's 7×

Task-Specific Regime Map

Task Type Best Architecture Why
Compositional / multiplicative SinGLU Cross-terms match function structure
Geometric / rotational (spirals) v10 (free phase) Phase shifts rotate boundaries
Multi-scale frequencies (audio) v15 (dual-phase) Explicit ω + 2ω channels
Pure memorization SinGLU Maximum information density
OOD / distribution shift Vanilla ReLU No hallucinated oscillations
Unseen frequency generalization SinGLU Fixed basis transfers across scales

Repository Files

Benchmarks (one per version)

File Contains
benchmark.py v1 — original RichNeuron vs Vanilla
benchmark_v4.py v4 — width-fix strategies (LowRank, Shared, SinGLU)
benchmark_v5.py v5 — honest re-eval (3 seeds, gradient norms, OOD)
benchmark_v6.py v6 — adaptive routing neuron
benchmark_v7.py v7 — learnable frequency neuron
benchmark_v8.py v8 — adaptive phase + amplitude gate
benchmark_v9.py v9 — controlled freq + phase + gate
benchmark_v10.py v10 — SinGLU + free phase
benchmark_v11.py v11 — SinGLU + disciplined phase
benchmark_v12.py v12 — SinGLU + signal-proportional phase (FM)
benchmark_v13.py v13 — SinGLU + aligned phase + correlation analysis
benchmark_v15.py v15 — dual-phase decomposition + killer experiments

Results & Reports

File Contains
results.json v1 raw results
results_v4.jsonresults_v15.json Per-version raw JSON with per-seed scores
PAPER.md Full technical report with analysis
FINDINGS_SUMMARY.md Complete architecture catalog and all results tables
CORRECTIONS.md Data verification and audit trail

Quick Start

pip install torch numpy
python benchmark_v10.py   # Run the best adaptive variant
python benchmark_v15.py   # Run dual-phase + killer experiments

All benchmarks use CPU-only PyTorch and complete in ~15 minutes each.


Reproducibility

  • Hardware: CPU-only (2 vCPU, 8GB RAM)
  • Total runtime: ~4 hours for all 12 benchmarks
  • Framework: PyTorch
  • Seeds: 3 random seeds per experiment
  • Statistical significance: Mean ± std reported; all claims based on consistent seed-wise ordering
  • Parameter matching: Binary search over hidden dimensions to match budgets within ~5%

Citation

@misc{richneuron2025,
  title={Beyond Linear Neurons: An Empirical Study of Multiplicative Periodic 
         Architectures at Small Scale},
  author={anshdadhich},
  year={2025},
  url={https://huggingface.co/anshdadhich/richneuron-vs-vanilla-benchmark}
}

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for anshdadhich/richneuron-vs-vanilla-benchmark