ray-plantain / README.md
phanerozoic's picture
ray-plantain probe results README
c5e54f5 verified
---
language: en
license: apache-2.0
base_model: black-forest-labs/FLUX.2-klein-base-4B
library_name: diffusers
tags:
- interpretability
- per-head-attention
- paired-prompt-probe
- inverse-rendering
- flux2
- vision-banana
- arxiv:2604.20329
pipeline_tag: image-to-image
---
# ray-plantain
A per-head attention probe of FLUX.2 Klein 4B testing whether the base model has implicitly separated illumination from reflectance β€” i.e., whether intrinsic-image decomposition emerges as a representational axis without inverse-rendering supervision.
## Thesis
Intrinsic-image decomposition (separating the rendered photograph into its physically meaningful albedo, normal, and lighting components) is a classical inverse-rendering problem that has historically required either explicit supervision or carefully designed self-supervision. ray-plantain tests whether the recipe is unnecessary β€” whether a 4 B image-generation model trained on natural photographs has discovered the rendered-vs-albedo distinction on its own, surfacing it as a per-head axis at base.
## Method
Twenty-five paired prompts holding the depicted scene constant. The A condition describes the final rendered photograph (full lighting, shadows, surface reflections). The B condition requests the lighting-free albedo / surface-color component only. Per-head capture protocol identical to the rest of the plantain probe family.
Rigor add-ons: per-head Cohen's d effect size; split-half consistency via 100 random 50/50 stimulus splits.
## Results
| Metric | Value | Significance |
|--------------------------------|------------------|----------------------------|
| Heads with \|t\| > 3 | 10,846 (66.5%) | 7.6Γ— empirical null p99 |
| Heads with \|t\| > 5 | 7,445 (45.6%) | 1,489Γ— empirical null p99 |
| Heads with \|d\| > 0.8 (large) | 9,136 (56.0%) | β€” |
| Split-half r (median) | 0.912 | [0.91, 0.92] IQR |
| Max \|t\| | 34.48 | β€” |
**Top blocks by max \|t\|:**
- single[0]: max\|t\|=34.48, 510/768 heads at \|t\|>3, median \|d\|=1.00
- joint[0]: max\|t\|=28.91, 169/192 heads at \|t\|>3, median \|d\|=2.54
- joint[4]: max\|t\|=27.84, 149/192 heads at \|t\|>3, median \|d\|=1.32
- joint[3]: max\|t\|=26.80, 151/192 heads at \|t\|>3, median \|d\|=1.22
- joint[1]: max\|t\|=26.76, 135/192 heads at \|t\|>3, median \|d\|=1.10
**Interpretation.** The strongest plantain probe finding to date by every rigor metric. **Over half of all 16,320 attention heads** in the model show large-effect-size selectivity (Cohen's d > 0.8) for the rendered/albedo distinction. Split-half consistency r=0.91 places the axis among the most reproducible representational features located so far. The 1,489Γ— ratio over the empirical null at |t|>5 is the highest reported across the plantain probe family.
The result implies that Klein has implicitly performed an inverse-rendering decomposition during pretraining and made the components separately addressable at the per-head level. The maximum-effect block (joint[0]) has 169 of 192 heads at |t|>3 with median Cohen's d above 2.5 β€” the rendered/albedo distinction is the dominant feature partition for that block. This places a measurable upper bound on how much of the inverse-rendering literature's machinery is actually necessary; the base model already separates illumination from reflectance internally, and the open question shifts to whether the separation is *correct* (matches physically computed albedo) rather than whether it exists.
## Status
Probe complete. No LoRA training; this is a base-model interpretability finding.
## Limitations
The probe establishes that Klein has a separable representation of "rendered photograph" vs. "albedo." It does not establish that the predicted albedo is *correct* β€” i.e., physically faithful to a held-out lighting decomposition. A natural follow-up generates the predicted albedo at full inference depth and compares to physically computed albedo on a synthetic-image test set with known intrinsics.
The pair count is 25; per-head reproducibility (r=0.91) is high but a larger sweep would tighten estimates and enable per-block confidence intervals.
The probe is correlational.
## License
Apache 2.0 β€” matches base FLUX.2 Klein 4B.
## References
- Gabeur, V., Long, S., Peng, S., et al. *Image Generators are Generalist Vision Learners.* [arXiv:2604.20329](https://arxiv.org/abs/2604.20329) (2026).
- Black Forest Labs. *FLUX.2 Klein.* https://bfl.ai/models/flux-2-klein (2025).