You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Atlas โ€” 3D-Tokenized LLM for Autonomous Driving

ๅŸบไบŽ Atlas ่ฎบๆ–‡ ็š„ๅคšๆจกๆ€่‡ชๅŠจ้ฉพ้ฉถๅคง่ฏญ่จ€ๆจกๅž‹ๅฎž็Žฐใ€‚ๅฐ† StreamPETR๏ผˆ3D ็›ฎๆ ‡ๆฃ€ๆต‹๏ผ‰ๅ’Œ TopoMLP๏ผˆ่ฝฆ้“็บฟๆฃ€ๆต‹๏ผ‰ๆๅ–็š„ 3D visual tokens ๆณจๅ…ฅ Vicuna-7B LLM๏ผŒๅฎž็Žฐๆฃ€ๆต‹ใ€่ฝฆ้“็บฟใ€่ง„ๅˆ’็ญ‰ๅคšไปปๅŠก็ปŸไธ€็”Ÿๆˆใ€‚

้กน็›ฎ็ป“ๆž„

3dtokenizer-atlas/
โ”œโ”€โ”€ train_atlas.py                  # Atlas LLM ่ฎญ็ปƒๅ…ฅๅฃ
โ”œโ”€โ”€ eval_atlas.py                   # Atlas ่ฏ„ไผฐๅ…ฅๅฃ
โ”œโ”€โ”€ extract_streampetr_tokens.py    # ้ข„ๆๅ– StreamPETR detection tokens
โ”œโ”€โ”€ extract_topomlp_tokens.py      # ้ข„ๆๅ– TopoMLP lane tokens
โ”œโ”€โ”€ train_streampetr.sh             # StreamPETR ้ข„่ฎญ็ปƒๅฏๅŠจ่„šๆœฌ
โ”œโ”€โ”€ train_topomlp.sh                # TopoMLP ้ข„่ฎญ็ปƒๅฏๅŠจ่„šๆœฌ
โ”‚
โ”œโ”€โ”€ configs/
โ”‚   โ”œโ”€โ”€ streampetr_atlas_aligned.py # StreamPETR ้…็ฝฎ (EVA-02 ViT-L, 800x1600)
โ”‚   โ”œโ”€โ”€ topomlp_atlas_aligned.py    # TopoMLP ้…็ฝฎ (EVA-02 ViT-L, 800x1600)
โ”‚   โ”œโ”€โ”€ ds_zero2.json               # DeepSpeed ZeRO-2 ้…็ฝฎ
โ”‚   โ””โ”€โ”€ REPRODUCTION.md             # ๅค็Žฐๆ–‡ๆกฃ
โ”‚
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ model/
โ”‚   โ”‚   โ”œโ”€โ”€ modeling_atlas.py       # AtlasForCausalLM ไธปๆจกๅž‹
โ”‚   โ”‚   โ”œโ”€โ”€ streampetr_adapter.py   # StreamPETR โ†’ ๆฃ€ๆต‹ token ้€‚้…ๅ™จ
โ”‚   โ”‚   โ”œโ”€โ”€ topomlp_adapter.py      # TopoMLP โ†’ ๅœฐๅ›พ token ้€‚้…ๅ™จ (Perceiver resampler)
โ”‚   โ”‚   โ””โ”€โ”€ token_resampler.py      # CrossAttentionTokenResampler
โ”‚   โ”œโ”€โ”€ dataset/
โ”‚   โ”‚   โ”œโ”€โ”€ atlas_dataset.py        # AtlasDataset + Collate
โ”‚   โ”‚   โ””โ”€โ”€ scene_sampler.py        # SceneSequentialSampler (ๆ—ถๅบ้‡‡ๆ ท)
โ”‚   โ”œโ”€โ”€ eval/
โ”‚   โ”‚   โ””โ”€โ”€ metrics.py              # ่ฏ„ไผฐๆŒ‡ๆ ‡ (F1/Chamfer/L2/Collision)
โ”‚   โ””โ”€โ”€ prompting.py                # ๅคšไปปๅŠก Prompt ๆจกๆฟ
โ”‚
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ gen_atlas_full_data.py               # nuScenes โ†’ ๆฃ€ๆต‹ QA JSON
โ”‚   โ”œโ”€โ”€ gen_atlas_openlane_subsetB_lane_qa.py # OpenLane-V2 โ†’ ่ฝฆ้“็บฟ QA JSON
โ”‚   โ””โ”€โ”€ gen_atlas_planning_qa.py             # nuScenes โ†’ ่ง„ๅˆ’ QA JSON
โ”‚
โ”œโ”€โ”€ data/                                    # ่ฎญ็ปƒ/้ชŒ่ฏๆ•ฐๆฎ (JSON)
โ”‚   โ”œโ”€โ”€ atlas_nuscenes_train.json            # ๆฃ€ๆต‹ (28,130 ๆ ทๆœฌ)
โ”‚   โ”œโ”€โ”€ atlas_nuscenes_val.json              # ๆฃ€ๆต‹้ชŒ่ฏ (6,019 ๆ ทๆœฌ)
โ”‚   โ”œโ”€โ”€ openlane_subsetB_lane_train_4pt.json # ่ฝฆ้“็บฟ (27,968 ๆ ทๆœฌ, 4 ็‚น/lane)
โ”‚   โ”œโ”€โ”€ openlane_subsetB_lane_val_4pt.json   # ่ฝฆ้“็บฟ้ชŒ่ฏ (6,019 ๆ ทๆœฌ)
โ”‚   โ”œโ”€โ”€ atlas_planning_train.json            # ่ง„ๅˆ’ (23,541 ๆ ทๆœฌ)
โ”‚   โ””โ”€โ”€ atlas_planning_val.json              # ่ง„ๅˆ’้ชŒ่ฏ (5,037 ๆ ทๆœฌ)
โ”‚
โ”œโ”€โ”€ pretrained/                     # ้ข„่ฎญ็ปƒๆƒ้‡
โ”‚   โ”œโ”€โ”€ vicuna-7b-v1.5/            # Vicuna-7B-v1.5 LLM
โ”‚   โ”œโ”€โ”€ eva02_L_coco_det_sys_o365_remapped_fixed.pth
โ”‚   โ””โ”€โ”€ streampetr/
โ”‚       โ””โ”€โ”€ streampetr_eva02_ep24.pth
โ”‚
โ”œโ”€โ”€ work_dirs/
โ”‚   โ”œโ”€โ”€ atlas_full_repro/           # ๅฝ“ๅ‰่ฎญ็ปƒ่พ“ๅ‡บ
โ”‚   โ”œโ”€โ”€ precomputed_det_tokens/     # ้ข„ๆๅ–็š„ StreamPETR tokens
โ”‚   โ”‚   โ””โ”€โ”€ train/                  # 56,098 ไธช .pt ๆ–‡ไปถ (det+planning+lane)
โ”‚   โ”œโ”€โ”€ precomputed_map_tokens/     # ้ข„ๆๅ–็š„ TopoMLP tokens
โ”‚   โ”‚   โ””โ”€โ”€ train/                  # 27,968 ไธช .pt ๆ–‡ไปถ (lane only)
โ”‚   โ””โ”€โ”€ topomlp_atlas_aligned/     # TopoMLP ้ข„่ฎญ็ปƒๆƒ้‡
โ”‚       โ””โ”€โ”€ epoch_24.pth
โ”‚
โ””โ”€โ”€ external/                       # ๅค–้ƒจไพ่ต–
    โ”œโ”€โ”€ StreamPETR/
    โ”œโ”€โ”€ TopoMLP_Repo/
    โ””โ”€โ”€ nuscenes-devkit/

ๆจกๅž‹ๆžถๆž„

                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  6x ็Žฏ่ง†็›ธๆœบๅ›พ็‰‡ โ†’ โ”‚ StreamPETR (frozen, EVA-02 ViT-L)    โ”‚โ†’ det tokens [B, 256, 256]
                   โ”‚ TopoMLP   (frozen, EVA-02 ViT-L)    โ”‚โ†’ lane queries โ†’ Resampler โ†’ map tokens [B, 256, 256]
                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                    โ†“
                         AtlasUnifiedProjector
                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                     โ”‚ projector_det: Linear(256โ†’4096) โ”‚  โ† ๅ•ๅฑ‚็บฟๆ€งๆŠ•ๅฝฑ
                     โ”‚ projector_map: Linear(256โ†’4096) โ”‚
                     โ”‚ projector_rp:  Linear(3โ†’256)    โ”‚  โ† Reference Point, zero-init
                     โ”‚ features += projector_rp(ref)   โ”‚
                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                    โ†“
                    ๆณจๅ…ฅๅˆฐ <query> token ไฝ็ฝฎ (256 det + 256 map)
                                    โ†“
                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                   โ”‚   Vicuna-7B (ๅ…จๅ‚ๆ•ฐๅพฎ่ฐƒ, DeepSpeed)   โ”‚
                   โ”‚   Causal Language Modeling Loss      โ”‚
                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                    โ†“
                         ๅคšไปปๅŠกๆ–‡ๆœฌ่พ“ๅ‡บ
              (3D ๆฃ€ๆต‹ / ่ฝฆ้“็บฟ / ่ง„ๅˆ’่ฝจ่ฟน)

่ฎญ็ปƒ้…็ฝฎ

ไธŽ่ฎบๆ–‡ (arXiv:2405.18361) Appendix B.2 ๅฏน้ฝใ€‚

Atlas LLM (ๅฝ“ๅ‰่ฎญ็ปƒ)

ๅ‚ๆ•ฐ ๅ€ผ
LLM Vicuna-7B-v1.5
ๅพฎ่ฐƒๆ–นๅผ ๅ…จๅ‚ๆ•ฐๅพฎ่ฐƒ (ๆ—  LoRA)
ๅฏ่ฎญ็ปƒๅ‚ๆ•ฐ 6,740,530,176
Learning Rate 2e-5
Optimizer AdamW (weight_decay=1e-4, torch_adam, adam_w_mode)
LR Schedule Cosine with warmup (3% steps)
Epochs 8
Batch Size 1 per GPU
Gradient Accumulation 2
Effective Batch Size 8 (4 GPU x 1 x 2 accum)
Total Steps 79,632
Warmup Steps 2,388
Max Sequence Length 4096 tokens
ๅˆ†ๅธƒๅผ DeepSpeed ZeRO-2 (optimizer sharding)
GPU 4x NVIDIA H100 80GB
็ฒพๅบฆ BF16 (model + gradients), optimizer states sharded via ZeRO-2
Memory Queue StreamPETR temporal modeling (3 frames, top-256, FIFO)

่ฎญ็ปƒๆ•ฐๆฎ

ไปปๅŠก ๆ•ฐๆฎๆ–‡ไปถ ๆ ทๆœฌๆ•ฐ
3D ็›ฎๆ ‡ๆฃ€ๆต‹ atlas_nuscenes_train.json 28,130
3D ่ฝฆ้“็บฟๆฃ€ๆต‹ openlane_subsetB_lane_train_4pt.json 27,968
่ฝจ่ฟน่ง„ๅˆ’ atlas_planning_train.json 23,541
ๆ€ป่ฎก 79,639

่ฝฆ้“็บฟๆ•ฐๆฎไฝฟ็”จ 4 ไธชๅ‡ๅŒ€้‡‡ๆ ท็‚น/lane (ไธŽ่ฎบๆ–‡ Appendix A.2 ไธ€่‡ด)ใ€‚ๆ‰€ๆœ‰ๅๆ ‡ไฝฟ็”จ 1000-bin ็ฆปๆ•ฃๅŒ–๏ผŒBEV ่Œƒๅ›ด [-50m, +50m]ใ€‚

3D Tokenizer ้ข„่ฎญ็ปƒ (ๅทฒๅฎŒๆˆ)

ๅ‚ๆ•ฐ StreamPETR TopoMLP
Backbone EVA-02 ViT-L (embed_dim=1024) EVA-02 ViT-L (embed_dim=1024)
Resolution 800x1600 800x1600
Queries 256 (detection) 256 (map, resampled from 1800)
Control Points - 4 per lane
Epochs 24 24
ๆ•ฐๆฎ้›† nuScenes trainval OpenLane-V2 subset-B

ๅฟซ้€Ÿๅผ€ๅง‹

1. ็Žฏๅขƒ

conda activate streampetr
# ไธป่ฆไพ่ต–: PyTorch 2.0+, transformers, peft, flash-attn, mmcv 1.7, mmdet3d 1.0
# DeepSpeed (ZeRO-2): pip install deepspeed

2. ๆ•ฐๆฎๅ‡†ๅค‡

# nuScenes ๆ•ฐๆฎๆ น็›ฎๅฝ• (ๅซ v1.0-trainval/ ๅ’Œ samples/)
export DATA_ROOT=/path/to/nuscenes

# OpenLane-V2 subset-B
export OPENLANE_ROOT=/path/to/OpenLane-V2/subset_B

# ็”Ÿๆˆ่ฝฆ้“็บฟ QA ๆ•ฐๆฎ (4 ็‚น/lane, ไธŽ่ฎบๆ–‡ไธ€่‡ด)
python scripts/gen_atlas_openlane_subsetB_lane_qa.py \
  --openlane_root $OPENLANE_ROOT \
  --split train --out_json data/openlane_subsetB_lane_train_4pt.json

python scripts/gen_atlas_openlane_subsetB_lane_qa.py \
  --openlane_root $OPENLANE_ROOT \
  --split val --out_json data/openlane_subsetB_lane_val_4pt.json

3. ่ฎญ็ปƒ

# Step 1: ้ข„ๆๅ– TopoMLP lane tokens (4x H100 ๅนถ่กŒ, ~4.5 ๅฐๆ—ถ)
for i in 0 1 2 3; do
    CUDA_VISIBLE_DEVICES=$i python extract_topomlp_tokens.py \
      --topomlp_config configs/topomlp_atlas_aligned.py \
      --topomlp_ckpt work_dirs/topomlp_atlas_aligned/epoch_24.pth \
      --data_json data/openlane_subsetB_lane_train_4pt.json \
      --output_dir work_dirs/precomputed_map_tokens/train \
      --shard_id $i --num_shards 4 &
done; wait

# Step 2: 4 GPU ๅ…จๅ‚ๆ•ฐๅพฎ่ฐƒ + DeepSpeed ZeRO-2
torchrun --nproc_per_node=4 train_atlas.py \
  --llm_model pretrained/vicuna-7b-v1.5 \
  --topomlp_config configs/topomlp_atlas_aligned.py \
  --topomlp_ckpt work_dirs/topomlp_atlas_aligned/epoch_24.pth \
  --data_json data/atlas_nuscenes_train.json,data/atlas_planning_train.json,data/openlane_subsetB_lane_train_4pt.json \
  --data_root $DATA_ROOT \
  --precomputed_det_tokens work_dirs/precomputed_det_tokens/train \
  --precomputed_map_tokens work_dirs/precomputed_map_tokens/train \
  --output_dir work_dirs/atlas_full_repro \
  --lr 2e-5 --weight_decay 1e-4 \
  --batch_size 1 --epochs 8 --gradient_accumulation_steps 2 \
  --warmup_ratio 0.03 --max_grad_norm 1.0 \
  --save_epochs 2 --log_steps 100 \
  --seed 42 --num_workers 2 \
  --deepspeed configs/ds_zero2.json

4. ่ฏ„ไผฐ

python eval_atlas.py \
  --checkpoint work_dirs/atlas_full_repro/final/checkpoint.pt \
  --llm_model pretrained/vicuna-7b-v1.5 \
  --topomlp_config configs/topomlp_atlas_aligned.py \
  --topomlp_ckpt work_dirs/topomlp_atlas_aligned/epoch_24.pth \
  --data_json data/openlane_subsetB_lane_val_4pt.json \
  --data_root $DATA_ROOT \
  --batch_size 1 --max_new_tokens 512 --no_flash_attn

ๅ‚่€ƒ

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for guoyb0/3dtokenizer-atlas-4H100