phanerozoic commited on
Commit
72a333d
Β·
verified Β·
1 Parent(s): 058f9da

Add full NYU Eigen test table to pending/ README

Browse files
Files changed (1) hide show
  1. pending/README.md +16 -6
pending/README.md CHANGED
@@ -8,16 +8,26 @@ Staged weights from a follow-up training run that's substantially better than th
8
  - `text_encoder/` β€” full Qwen3-4B text encoder with the rank-32 text-encoder LoRA fused in (`merge_and_unload`). Loads as a drop-in `text_encoder` for `Flux2KleinPipeline`.
9
  - `tokenizer/` β€” Qwen3 tokenizer (unchanged from base; included so the pending/ folder is self-contained).
10
 
11
- ## Early metrics (10 hardest NYU val frames)
12
 
13
- | metric | rank-32 baseline | this checkpoint | Vision Banana paper (full set) |
14
  |---|---|---|---|
15
- | RMSE (m) | 3.0–4.7 | **0.436** | n/a |
16
- | Ξ΄1 | 0.00–0.38 | **0.819** | 0.948 |
 
 
 
17
 
18
- The 10 frames are the worst-performing scenes from a per-frame eval of the rank-32 baseline. Going from 3–4 m catastrophic failure to 0.44 m on those exact scenes, with Ξ΄1 jumping from near-zero to 0.82, is the headline.
19
 
20
- Full 654-frame NYU Eigen eval is currently running; partial numbers (~30% of the set) sit at 0.59 m RMSE / 0.74 Ξ΄1 / 0.17 AbsRel. Final number will be in this file once it lands.
 
 
 
 
 
 
 
21
 
22
  ## Training
23
 
 
8
  - `text_encoder/` β€” full Qwen3-4B text encoder with the rank-32 text-encoder LoRA fused in (`merge_and_unload`). Loads as a drop-in `text_encoder` for `Flux2KleinPipeline`.
9
  - `tokenizer/` β€” Qwen3 tokenizer (unchanged from base; included so the pending/ folder is self-contained).
10
 
11
+ ## Full NYU Eigen test (490 frames, 28 inference steps at 768 Γ— 768)
12
 
13
+ | metric | this checkpoint | rank-32 baseline (full set) | Vision Banana paper (full set) |
14
  |---|---|---|---|
15
+ | δ₁ | **0.745** | 0.370 | 0.948 |
16
+ | Ξ΄β‚‚ | 0.958 | β€” | β€” |
17
+ | δ₃ | 0.988 | β€” | β€” |
18
+ | AbsRel | **0.163** | 0.461 | 0.074 |
19
+ | RMSE | **0.596 m** | 1.566 m | β€” |
20
 
21
+ Doubled δ₁ and more than halved RMSE versus the rank-32 LoRA baseline at the same architecture level. δ₁ ~0.20 below paper-level β€” the paper full-instruction-tunes a hundreds-of-billions-parameter Nano Banana Pro on its original training mixture; this is rank-256 LoRA on a 4B open base at 23% of a 15 000-step schedule.
22
 
23
+ ## Hardest 10 NYU frames (worst per-frame RMSE on the rank-32 baseline)
24
+
25
+ | metric | rank-32 baseline | this checkpoint |
26
+ |---|---|---|
27
+ | RMSE (m) | 3.0–4.7 | **0.436** |
28
+ | δ₁ | 0.00–0.38 | **0.819** |
29
+
30
+ Going from 3–4 m catastrophic failure to 0.44 m on those exact scenes, with δ₁ jumping from near-zero to 0.82, is what motivated the full eval.
31
 
32
  ## Training
33