Submitted 2048 Model (earlier public leaderboard entry)

This is the earlier of two checkpoints submitted to the S23DR 2026 public leaderboard. It trains on the 2048-point dataset only (no 4096 transfer); within 2048 it is two-phase (phase 1 from scratch + phase 2 with endpoint loss and cooldown, resumed from step125000.pt). The current top-level checkpoint.pt (dev val HSS=0.382, public test HSS=0.4470) is its descendant via the 3-step 2048 -> 4096 -> endpoint-cooldown recipe and is the better submission on both dev val and public test.

Split	Metric	Score
Public test (leaderboard)	HSS	0.4273
Dev val (last 1024 train scenes), 2048-pt input	HSS_conf	0.369
Dev val (last 1024 train scenes), 4096-pt input	HSS_conf	0.367

We did not eval on the official validation split (hf://usm3d/s23dr-2026-sampled_*_v2:validation) during development, so no number here refers to it. See "Evaluation sets" in ../REPRODUCE.md.

Training details

Two-phase 2048-only training on hf://usm3d/s23dr-2026-sampled_2048_v2:train (phase 1 from scratch to step 125k, phase 2 resumed from step125000.pt and trained to step 160k with endpoint loss and a 20k-step cooldown ending at step 160k):

Architecture: same Perceiver as the current release (hidden=256, latent_tokens=256, latent_layers=7, segments=64)
Input: 2048 points
Steps: 160,000
Final LR: 3e-5 (after cooldown)
Batch size: 32
Cooldown: starts at step 140,000, lasts 20,000 steps
Endpoint weight: 0.1 (used throughout, not only in cooldown)
Confidence weight: 0.1
Seed: 353

Full training args are in args.json.

How to run inference

This checkpoint expects 2048-point input. To run it with the submission harness you would need to modify script.py to use SEQ_LEN = 2048. Alternatively, load the weights manually via EdgeDepthSegmentsModel in s23dr_2026_example/model.py and feed a 2048-point cloud.

Why it is included

The current release (../checkpoint.pt, dev val HSS=0.382) is the better submission on both dev val and public test. This older checkpoint is preserved as the empirical anchor for the dev-val-to-public-test gap.

Dev-val-to-public-test gap observed across both submissions:

Submission	Dev val HSS	Public test HSS	Gap
2048 (this checkpoint)	0.369	0.4273	+0.058
4096 (`../checkpoint.pt`)	0.382	0.4470	+0.065

Both submissions show roughly the same +0.06 dev-val-to-public-test gap, so dev val HSS appears to be a reasonable proxy for public test HSS at this scale (subject to whatever distributional differences exist between the dev val split, the official validation split we did not eval on, and the public test split). Note that the inference pipeline also changed between the two submissions (SEQ_LEN 2048 -> 4096, CONF_THRESH 0.7 -> 0.5, single-pass merge -> iterative merge), so the +0.020 public test gain is not attributable to the model alone.