arxiv:2605.06384

MinMax Recurrent Neural Cascades

Published on May 8

Authors:

Abstract

MinMax recurrent neural cascades offer theoretically advantageous properties including bounded gradients, parallel evaluation capabilities, and strong expressive power for sequence modeling tasks.

AI-generated summary

We show that the MinMax algebra provides a form of recurrence that is expressively powerful, efficiently implementable, and most importantly it is not affected by vanishing or exploding gradient. We call MinMax Recurrent Neural Cascades (RNCs) the models obtained by cascading several layers of neurons that employ such recurrence. We show that MinMax RNCs enjoy many favourable theoretical properties. First, their formal expressivity includes all regular languages, arguably the maximal expressivity for a finite-memory system. Second, they can be evaluated in parallel with a runtime that is logarithmic in the input length given enough processors; and they can also be evaluated sequentially. Third, their state and activations are bounded uniformly for all input lengths. Fourth, at almost all points, their loss gradient exists and it is bounded. Fifth, they do not exhibit a vanishing state gradient: the gradient of a state w.r.t. a past state can have constant value one regardless of the time distance between the two states. Finally, we find empirical evidence that the favourable theoretical properties of MinMax RNCs are matched by their practical capabilities: they are able to perfectly solve a number of synthetic tasks, showing superior performance compared to the considered state-of-the-art recurrent neural networks; also, we train a MinMax RNC of 127M parameters on next-token prediction, and the obtained model shows competitive performance for its size, providing evidence of the potential of MinMax RNCs on real-world tasks.

View arXiv page View PDF GitHub 0 Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.06384 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.06384 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.06384 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.