Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 7
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Relaxed Recursive Transformer implementation, uptraining with distillation on openwebtext2. arxiv.org/abs/2410.20672