Initial probe report

8518955 verified 8 days ago

4.05 kB

language: en
license: apache-2.0
base_model: black-forest-labs/FLUX.2-klein-base-4B
library_name: diffusers
tags:
  - interpretability
  - mechanistic-interpretability
  - probing
  - arrow-of-time
  - flux2
  - vision-banana
  - arxiv:2604.20329

after-plantain

A per-head attention probe of FLUX.2 Klein 4B testing whether the base model represents the arrow of time as a generalized axis distinct from object-class recognition. Companion to Image Generators are Generalist Vision Learners (Gabeur et al., 2026; arXiv:2604.20329).

Hypothesis

A generative image model trained on natural visual data should encounter both pre-event and post-event states in roughly equal frequency. If the model learns to represent post-event states as a category (independent of which event occurred), per-head attention should respond systematically more to post-event images than to matched pre-event images, when the two states span diverse and otherwise unrelated phenomena. A head that responds only to "broken glass" or only to "spilled coffee" is a class detector. A head that responds to all post-event states across unrelated event types is doing arrow-of-time.

Method

25 image pairs are constructed, each pair depicting a single irreversible cause-effect event in two states: pre-event and post-event. The events are mutually unrelated by design: shattered glass, spilled coffee, cracked egg, fallen dominoes, snuffed candle, melted ice cube, collapsed sandcastle, eaten apple, broken vase, flat tire, burnt match, opened envelope, melted pond, smashed watermelon, scattered books, used pencil, popped balloon, wilted flower, melted snowman, bread crumbs, footprints in sand, collapsed block tower, post-cooking kitchen counter, worn sneaker, scattered jigsaw pieces.

Each image is generated once at 14 inference steps from a controlled prompt and saved to disk. All 50 fixed images are then used as probe stimuli. For each pass, the pipeline is run image-conditioned at one inference step with guidance_scale=1.0 and the identical prompt "Describe what is depicted in this image." A forward pre-hook on every transformer attention output projection (5 joint MMDiT blocks + 20 single blocks, 16 320 heads total) captures per-head RMS magnitude of the input activation. The per-head paired t-statistic is computed across 25 pairs as t = mean(after − before) / (std / sqrt(N)). The empirical null is constructed by 200 independent random sign-flips of pair-member labels (relabelling before↔after within each pair).

By design, neither the prompt nor the probe protocol varies between conditions. Differences in per-head response are attributable only to the input image being a pre-event vs post-event state. A head's signal must generalize across the 25 unrelated event types to register a high paired t-statistic; class-detector heads contribute to only one or two pairs and average out.

Results

metric	observed	null mean	null p99	observed / null p99
heads with \|t\|>3	2 426 (14.9%)	82	501	4.8×
heads with \|t\|>5	159 (1.0%)	0	3	53×
max \|t\|	7.56	—	—	—

A small but statistically robust population of heads responds systematically to post-event states across diverse phenomena. Approximately 1% of all heads exhibit a paired t-statistic above 5 against an empirical 99th-percentile null of 3 heads, a ratio of 53×. The strongest individual head reaches t = 7.56. Signal magnitude is substantially smaller than measured in companion probes for physical scale, perspective-taking, kinematic state, or surface roughness, suggesting that arrow-of-time is represented in Klein at base but is not a major axis of variation.

License

Apache 2.0.

References

Gabeur, V., Long, S., Peng, S., et al. Image Generators are Generalist Vision Learners. arXiv:2604.20329 (2026).
Black Forest Labs. FLUX.2 Klein. https://bfl.ai/models/flux-2-klein (2025).