LuminaRS / README.md
asdf98's picture
Upload README.md
dc40562 verified

LuminaRS — Lightweight Recursive Art Image Generator

A novel ~90M parameter image generation model for art/illustration that runs on mobile devices (2-4GB VRAM).

Why LuminaRS?

Problem Current Solutions LuminaRS
Heavy models (6-12GB) SDXL, Flux ~90M params, <500MB
Can't run mobile Quantized SD (quality loss) Designed small from scratch
Poor prompt adherence SD 1.5 TRM-style recursive reasoning
No art specialization General photo models Art-focused training stages
Unstable training Diffusion (score matching) Flow matching (stable ODE)

Architecture (Novel Contributions)

1. Recursive Shared-Weight Refinement (from TRM)

Inspired by Tiny Recursive Models — beat 200x larger LLMs with 7M params.

for _ in range(T): z = z + unet(z, text, t)  # shared-weight refinement

Effective depth = T x L without Tx parameters.

2. Flow Matching (instead of Diffusion)

  • v(x_t, t) = x_clean - x_noise (straight-line velocity)
  • 10-12 inference steps vs 50+ for diffusion
  • No score matching instability

3. ConvNeXt + MQA Cross-Attention

Depthwise 7x7 conv, Adaptive LayerNorm (time), MQA cross-attn (text), GELU MLP

4. Staged Freeze/Thaw Training

Stage What's Trained LR
1 All denoiser params 1e-4
2 Cross-attention only 1e-5
3 All params, joint 1e-6

VAE and CLIP always frozen.

Parameter Budget

Component Params
Encoder ~35M
Bottleneck ~15M
Decoder ~35M
Embeds ~5M
Total trainable ~90M
VAE (frozen) ~83M
CLIP (frozen) ~303M
Inference VRAM (b=1) ~1.5-2GB

Quick Start

from luminars.model import LuminaRS
from luminars.config import LuminaRSConfig
from luminars.sampler import sample_flow
cfg = LuminaRSConfig()
model = LuminaRS(cfg)
latents = sample_flow(model, text_emb, (1,16,32,32), 12)

Files

  • luminars/ -- model, config, loss, sampler, train helpers
  • train.py -- main training script
  • LuminaRS_Colab.ipynb -- Colab notebook

Research Foundations

  • TRM (Jolicoeur-Martineau 2025): Recursive reasoning
  • SnapGen (2024): Mobile UNet design
  • ZigMa (2024): Mamba diffusion
  • Flow Matching (Lipman 2023): Stable ODE generation
  • MQA (Shazeer 2019): Multi-query attention
  • ConvNeXt (Liu 2022): Modernized CNN

MIT License