| language: en | |
| license: apache-2.0 | |
| base_model: black-forest-labs/FLUX.2-klein-base-4B | |
| library_name: diffusers | |
| tags: | |
| - video-prediction | |
| - dynamics | |
| - lora | |
| - flux2 | |
| - vision-banana | |
| - arxiv:2604.20329 | |
| pipeline_tag: image-to-image | |
| # moving-plantain | |
| A LoRA adapter on FLUX.2 Klein (4B) for single-step future-frame prediction. Tests whether the latent physics priors of an image generator can be surfaced through the instruction-tuning recipe of *Image Generators are Generalist Vision Learners* (Gabeur et al., 2026; [arXiv:2604.20329](https://arxiv.org/abs/2604.20329)). | |
| ## Thesis | |
| Vision Banana argues that image generation pretraining produces a generalist vision learner. moving-plantain extends that argument to dynamics. A model that can render a physically coherent t+1 frame conditioned on a t=0 frame and a free-form intervention prompt — "the ball rolls left", "the cup tips over", "the cloth falls" — implicitly carries a forward physics simulator in its weights. Recovering that simulator under parameter-efficient adaptation is the empirical test of whether generative vision pretraining encodes object permanence, gravity, contact dynamics, and other physical structure beyond static appearance. | |
| ## Method | |
| Input: a single RGB frame at t=0 and an intervention prompt describing the change. Output: the predicted RGB frame at t=1. Training pairs are drawn from natural video datasets, with intervention prompts derived from the optical flow / motion description between consecutive frames. The loss is the diffusion objective on the t=1 target. | |
| ## Status | |
| Placeholder. Weights and training data forthcoming. | |
| ## License | |
| Apache 2.0 — matches base FLUX.2 Klein 4B. | |
| ## References | |
| - Gabeur, Long, Peng, et al. *Image Generators are Generalist Vision Learners.* [arXiv:2604.20329](https://arxiv.org/abs/2604.20329) (2026). | |
| - Black Forest Labs. *FLUX.2 Klein.* https://bfl.ai/models/flux-2-klein (2025). | |