phanerozoic
/

deep-plantain

@@ -8,16 +8,26 @@ Staged weights from a follow-up training run that's substantially better than th
 - `text_encoder/` — full Qwen3-4B text encoder with the rank-32 text-encoder LoRA fused in (`merge_and_unload`). Loads as a drop-in `text_encoder` for `Flux2KleinPipeline`.
 - `tokenizer/` — Qwen3 tokenizer (unchanged from base; included so the pending/ folder is self-contained).
-## Early metrics (10 hardest NYU val frames)
-| metric | rank-32 baseline | this checkpoint | Vision Banana paper (full set) |
 |---|---|---|---|
-| RMSE (m) | 3.0–4.7 | **0.436** | n/a |
-| δ1 | 0.00–0.38 | **0.819** | 0.948 |
-The 10 frames are the worst-performing scenes from a per-frame eval of the rank-32 baseline. Going from 3–4 m catastrophic failure to 0.44 m on those exact scenes, with δ1 jumping from near-zero to 0.82, is the headline.
-Full 654-frame NYU Eigen eval is currently running; partial numbers (~30% of the set) sit at 0.59 m RMSE / 0.74 δ1 / 0.17 AbsRel. Final number will be in this file once it lands.
 ## Training

 - `text_encoder/` — full Qwen3-4B text encoder with the rank-32 text-encoder LoRA fused in (`merge_and_unload`). Loads as a drop-in `text_encoder` for `Flux2KleinPipeline`.
 - `tokenizer/` — Qwen3 tokenizer (unchanged from base; included so the pending/ folder is self-contained).
+## Full NYU Eigen test (490 frames, 28 inference steps at 768 × 768)
+| metric | this checkpoint | rank-32 baseline (full set) | Vision Banana paper (full set) |
 |---|---|---|---|
+| δ₁ | **0.745** | 0.370 | 0.948 |
+| δ₂ | 0.958 | — | — |
+| δ₃ | 0.988 | — | — |
+| AbsRel | **0.163** | 0.461 | 0.074 |
+| RMSE | **0.596 m** | 1.566 m | — |
+Doubled δ₁ and more than halved RMSE versus the rank-32 LoRA baseline at the same architecture level. δ₁ ~0.20 below paper-level — the paper full-instruction-tunes a hundreds-of-billions-parameter Nano Banana Pro on its original training mixture; this is rank-256 LoRA on a 4B open base at 23% of a 15 000-step schedule.
+## Hardest 10 NYU frames (worst per-frame RMSE on the rank-32 baseline)
+| metric | rank-32 baseline | this checkpoint |
+|---|---|---|
+| RMSE (m) | 3.0–4.7 | **0.436** |
+| δ₁ | 0.00–0.38 | **0.819** |
+Going from 3–4 m catastrophic failure to 0.44 m on those exact scenes, with δ₁ jumping from near-zero to 0.82, is what motivated the full eval.
 ## Training