You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Atlas — 3D-Tokenized LLM for Autonomous Driving

基于 Atlas 论文的多模态自动驾驶大语言模型实现。将 StreamPETR（3D 目标检测）和 TopoMLP（车道线检测）提取的 3D visual tokens 注入 Vicuna-7B LLM，实现检测、车道线、规划等多任务统一生成。

项目结构

3dtokenizer-atlas/
├── train_atlas.py                  # Atlas LLM 训练入口
├── eval_atlas.py                   # Atlas 评估入口
├── extract_streampetr_tokens.py    # 预提取 StreamPETR detection tokens
├── extract_topomlp_tokens.py      # 预提取 TopoMLP lane tokens
├── train_streampetr.sh             # StreamPETR 预训练启动脚本
├── train_topomlp.sh                # TopoMLP 预训练启动脚本
│
├── configs/
│   ├── streampetr_atlas_aligned.py # StreamPETR 配置 (EVA-02 ViT-L, 800x1600)
│   ├── topomlp_atlas_aligned.py    # TopoMLP 配置 (EVA-02 ViT-L, 800x1600)
│   ├── ds_zero2.json               # DeepSpeed ZeRO-2 配置
│   └── REPRODUCTION.md             # 复现文档
│
├── src/
│   ├── model/
│   │   ├── modeling_atlas.py       # AtlasForCausalLM 主模型
│   │   ├── streampetr_adapter.py   # StreamPETR → 检测 token 适配器
│   │   ├── topomlp_adapter.py      # TopoMLP → 地图 token 适配器 (Perceiver resampler)
│   │   └── token_resampler.py      # CrossAttentionTokenResampler
│   ├── dataset/
│   │   ├── atlas_dataset.py        # AtlasDataset + Collate
│   │   └── scene_sampler.py        # SceneSequentialSampler (时序采样)
│   ├── eval/
│   │   └── metrics.py              # 评估指标 (F1/Chamfer/L2/Collision)
│   └── prompting.py                # 多任务 Prompt 模板
│
├── scripts/
│   ├── gen_atlas_full_data.py               # nuScenes → 检测 QA JSON
│   ├── gen_atlas_openlane_subsetB_lane_qa.py # OpenLane-V2 → 车道线 QA JSON
│   └── gen_atlas_planning_qa.py             # nuScenes → 规划 QA JSON
│
├── data/                                    # 训练/验证数据 (JSON)
│   ├── atlas_nuscenes_train.json            # 检测 (28,130 样本)
│   ├── atlas_nuscenes_val.json              # 检测验证 (6,019 样本)
│   ├── openlane_subsetB_lane_train_4pt.json # 车道线 (27,968 样本, 4 点/lane)
│   ├── openlane_subsetB_lane_val_4pt.json   # 车道线验证 (6,019 样本)
│   ├── atlas_planning_train.json            # 规划 (23,541 样本)
│   └── atlas_planning_val.json              # 规划验证 (5,037 样本)
│
├── pretrained/                     # 预训练权重
│   ├── vicuna-7b-v1.5/            # Vicuna-7B-v1.5 LLM
│   ├── eva02_L_coco_det_sys_o365_remapped_fixed.pth
│   └── streampetr/
│       └── streampetr_eva02_ep24.pth
│
├── work_dirs/
│   ├── atlas_full_repro/           # 当前训练输出
│   ├── precomputed_det_tokens/     # 预提取的 StreamPETR tokens
│   │   └── train/                  # 56,098 个 .pt 文件 (det+planning+lane)
│   ├── precomputed_map_tokens/     # 预提取的 TopoMLP tokens
│   │   └── train/                  # 27,968 个 .pt 文件 (lane only)
│   └── topomlp_atlas_aligned/     # TopoMLP 预训练权重
│       └── epoch_24.pth
│
└── external/                       # 外部依赖
    ├── StreamPETR/
    ├── TopoMLP_Repo/
    └── nuscenes-devkit/

模型架构

                   ┌─────────────────────────────────────┐
  6x 环视相机图片 → │ StreamPETR (frozen, EVA-02 ViT-L)    │→ det tokens [B, 256, 256]
                   │ TopoMLP   (frozen, EVA-02 ViT-L)    │→ lane queries → Resampler → map tokens [B, 256, 256]
                   └─────────────────────────────────────┘
                                    ↓
                         AtlasUnifiedProjector
                     ┌────────────────────────────────┐
                     │ projector_det: Linear(256→4096) │  ← 单层线性投影
                     │ projector_map: Linear(256→4096) │
                     │ projector_rp:  Linear(3→256)    │  ← Reference Point, zero-init
                     │ features += projector_rp(ref)   │
                     └────────────────────────────────┘
                                    ↓
                    注入到 <query> token 位置 (256 det + 256 map)
                                    ↓
                   ┌────────────────────────────────────┐
                   │   Vicuna-7B (全参数微调, DeepSpeed)   │
                   │   Causal Language Modeling Loss      │
                   └────────────────────────────────────┘
                                    ↓
                         多任务文本输出
              (3D 检测 / 车道线 / 规划轨迹)

训练配置

与论文 (arXiv:2405.18361) Appendix B.2 对齐。

Atlas LLM (当前训练)

参数	值
LLM	Vicuna-7B-v1.5
微调方式	全参数微调 (无 LoRA)
可训练参数	6,740,530,176
Learning Rate	2e-5
Optimizer	AdamW (weight_decay=1e-4, torch_adam, adam_w_mode)
LR Schedule	Cosine with warmup (3% steps)
Epochs	8
Batch Size	1 per GPU
Gradient Accumulation	2
Effective Batch Size	8 (4 GPU x 1 x 2 accum)
Total Steps	79,632
Warmup Steps	2,388
Max Sequence Length	4096 tokens
分布式	DeepSpeed ZeRO-2 (optimizer sharding)
GPU	4x NVIDIA H100 80GB
精度	BF16 (model + gradients), optimizer states sharded via ZeRO-2
Memory Queue	StreamPETR temporal modeling (3 frames, top-256, FIFO)

训练数据

任务	数据文件	样本数
3D 目标检测	`atlas_nuscenes_train.json`	28,130
3D 车道线检测	`openlane_subsetB_lane_train_4pt.json`	27,968
轨迹规划	`atlas_planning_train.json`	23,541
总计		79,639

车道线数据使用 4 个均匀采样点/lane (与论文 Appendix A.2 一致)。所有坐标使用 1000-bin 离散化，BEV 范围 [-50m, +50m]。

3D Tokenizer 预训练 (已完成)

参数	StreamPETR	TopoMLP
Backbone	EVA-02 ViT-L (embed_dim=1024)	EVA-02 ViT-L (embed_dim=1024)
Resolution	800x1600	800x1600
Queries	256 (detection)	256 (map, resampled from 1800)
Control Points	-	4 per lane
Epochs	24	24
数据集	nuScenes trainval	OpenLane-V2 subset-B

快速开始

1. 环境

conda activate streampetr
# 主要依赖: PyTorch 2.0+, transformers, peft, flash-attn, mmcv 1.7, mmdet3d 1.0
# DeepSpeed (ZeRO-2): pip install deepspeed

2. 数据准备

# nuScenes 数据根目录 (含 v1.0-trainval/ 和 samples/)
export DATA_ROOT=/path/to/nuscenes

# OpenLane-V2 subset-B
export OPENLANE_ROOT=/path/to/OpenLane-V2/subset_B

# 生成车道线 QA 数据 (4 点/lane, 与论文一致)
python scripts/gen_atlas_openlane_subsetB_lane_qa.py \
  --openlane_root $OPENLANE_ROOT \
  --split train --out_json data/openlane_subsetB_lane_train_4pt.json

python scripts/gen_atlas_openlane_subsetB_lane_qa.py \
  --openlane_root $OPENLANE_ROOT \
  --split val --out_json data/openlane_subsetB_lane_val_4pt.json

3. 训练

# Step 1: 预提取 TopoMLP lane tokens (4x H100 并行, ~4.5 小时)
for i in 0 1 2 3; do
    CUDA_VISIBLE_DEVICES=$i python extract_topomlp_tokens.py \
      --topomlp_config configs/topomlp_atlas_aligned.py \
      --topomlp_ckpt work_dirs/topomlp_atlas_aligned/epoch_24.pth \
      --data_json data/openlane_subsetB_lane_train_4pt.json \
      --output_dir work_dirs/precomputed_map_tokens/train \
      --shard_id $i --num_shards 4 &
done; wait

# Step 2: 4 GPU 全参数微调 + DeepSpeed ZeRO-2
torchrun --nproc_per_node=4 train_atlas.py \
  --llm_model pretrained/vicuna-7b-v1.5 \
  --topomlp_config configs/topomlp_atlas_aligned.py \
  --topomlp_ckpt work_dirs/topomlp_atlas_aligned/epoch_24.pth \
  --data_json data/atlas_nuscenes_train.json,data/atlas_planning_train.json,data/openlane_subsetB_lane_train_4pt.json \
  --data_root $DATA_ROOT \
  --precomputed_det_tokens work_dirs/precomputed_det_tokens/train \
  --precomputed_map_tokens work_dirs/precomputed_map_tokens/train \
  --output_dir work_dirs/atlas_full_repro \
  --lr 2e-5 --weight_decay 1e-4 \
  --batch_size 1 --epochs 8 --gradient_accumulation_steps 2 \
  --warmup_ratio 0.03 --max_grad_norm 1.0 \
  --save_epochs 2 --log_steps 100 \
  --seed 42 --num_workers 2 \
  --deepspeed configs/ds_zero2.json

4. 评估

python eval_atlas.py \
  --checkpoint work_dirs/atlas_full_repro/final/checkpoint.pt \
  --llm_model pretrained/vicuna-7b-v1.5 \
  --topomlp_config configs/topomlp_atlas_aligned.py \
  --topomlp_ckpt work_dirs/topomlp_atlas_aligned/epoch_24.pth \
  --data_json data/openlane_subsetB_lane_val_4pt.json \
  --data_root $DATA_ROOT \
  --batch_size 1 --max_new_tokens 512 --no_flash_attn

参考

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for guoyb0/3dtokenizer-atlas-4H100

Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?

Paper • 2405.18361 • Published May 28, 2024