Submitted 2048 Model (earlier public leaderboard entry)
This is the earlier of two checkpoints submitted to the S23DR 2026 public leaderboard. It trains on the 2048-point dataset only (no 4096 transfer); within 2048 it is two-phase (phase 1 from scratch + phase 2 with endpoint loss and cooldown, resumed from step125000.pt). The current top-level checkpoint.pt (dev val HSS=0.382, public test HSS=0.4470) is its descendant via the 3-step 2048 -> 4096 -> endpoint-cooldown recipe and is the better submission on both dev val and public test.
| Split | Metric | Score |
|---|---|---|
| Public test (leaderboard) | HSS | 0.4273 |
| Dev val (last 1024 train scenes), 2048-pt input | HSS_conf | 0.369 |
| Dev val (last 1024 train scenes), 4096-pt input | HSS_conf | 0.367 |
We did not eval on the official validation split (hf://usm3d/s23dr-2026-sampled_*_v2:validation)
during development, so no number here refers to it. See "Evaluation sets" in ../REPRODUCE.md.
Training details
Two-phase 2048-only training on hf://usm3d/s23dr-2026-sampled_2048_v2:train (phase 1 from scratch to step 125k, phase 2 resumed from step125000.pt and trained to step 160k with endpoint loss and a 20k-step cooldown ending at step 160k):
- Architecture: same Perceiver as the current release (hidden=256, latent_tokens=256, latent_layers=7, segments=64)
- Input: 2048 points
- Steps: 160,000
- Final LR: 3e-5 (after cooldown)
- Batch size: 32
- Cooldown: starts at step 140,000, lasts 20,000 steps
- Endpoint weight: 0.1 (used throughout, not only in cooldown)
- Confidence weight: 0.1
- Seed: 353
Full training args are in args.json.
How to run inference
This checkpoint expects 2048-point input. To run it with the submission harness you would need to modify script.py to use SEQ_LEN = 2048. Alternatively, load the weights manually via EdgeDepthSegmentsModel in s23dr_2026_example/model.py and feed a 2048-point cloud.
Why it is included
The current release (../checkpoint.pt, dev val HSS=0.382) is the better submission on both dev val and public test. This older checkpoint is preserved as the empirical anchor for the dev-val-to-public-test gap.
Dev-val-to-public-test gap observed across both submissions:
| Submission | Dev val HSS | Public test HSS | Gap |
|---|---|---|---|
| 2048 (this checkpoint) | 0.369 | 0.4273 | +0.058 |
4096 (../checkpoint.pt) |
0.382 | 0.4470 | +0.065 |
Both submissions show roughly the same +0.06 dev-val-to-public-test gap, so dev val HSS appears to be a reasonable proxy for public test HSS at this scale (subject to whatever distributional differences exist between the dev val split, the official validation split we did not eval on, and the public test split). Note that the inference pipeline also changed between the two submissions (SEQ_LEN 2048 -> 4096, CONF_THRESH 0.7 -> 0.5, single-pass merge -> iterative merge), so the +0.020 public test gain is not attributable to the model alone.