Lin-Zhuo commited on
Commit
bad1832
·
verified ·
1 Parent(s): faaa414

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. .gitattributes +1 -0
  2. README.md +143 -3
  3. assets/teaser.png +3 -0
  4. lingbot-map.pt +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/teaser.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,143 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <h1 align="center">LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction</h1>
2
+
3
+ <p align="center">
4
+ <a href="lingbot-map_paper.pdf"><img src="https://img.shields.io/static/v1?label=Paper&message=PDF&color=red&logo=arxiv"></a>
5
+ <a href="https://technology.robbyant.com/lingbot-map"><img src="https://img.shields.io/badge/Project-Website-blue"></a>
6
+ <a href=""><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Model&message=HuggingFace&color=orange"></a>
7
+ <a href="LICENSE.txt"><img src="https://img.shields.io/badge/License-Apache--2.0-green"></a>
8
+ </p>
9
+
10
+ <p align="center">
11
+ <img src="assets/teaser.png" width="100%">
12
+ </p>
13
+
14
+ <p align="center">
15
+ <video src="https://gw.alipayobjects.com/v/huamei_vaouhm/afts/video/q0sdTr9Mm6IAAAAAmyAAAAgADglFAQJr" width="100%" autoplay loop muted playsinline></video>
16
+ </p>
17
+
18
+ ---
19
+
20
+ # Quick Start
21
+
22
+ ## Installation
23
+
24
+ **1. Create conda environment**
25
+
26
+ ```bash
27
+ conda create -n lingbot-map python=3.10 -y
28
+ conda activate lingbot-map
29
+ ```
30
+
31
+ **2. Install PyTorch (CUDA 12.8)**
32
+
33
+ ```bash
34
+ pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128
35
+ ```
36
+
37
+ > For other CUDA versions, see [PyTorch Get Started](https://pytorch.org/get-started/locally/).
38
+
39
+ **3. Install lingbot-map**
40
+
41
+ ```bash
42
+ pip install -e .
43
+ ```
44
+
45
+ **4. Install FlashInfer (recommended)**
46
+
47
+ FlashInfer provides paged KV cache attention for efficient streaming inference:
48
+
49
+ ```bash
50
+ # CUDA 12.8 + PyTorch 2.9
51
+ pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
52
+ ```
53
+
54
+ > For other CUDA/PyTorch combinations, see [FlashInfer installation](https://docs.flashinfer.ai/installation.html).
55
+ > If FlashInfer is not installed, the model falls back to SDPA (PyTorch native attention) via `--use_sdpa`.
56
+
57
+ **5. Visualization dependencies (optional)**
58
+
59
+ ```bash
60
+ pip install -e ".[vis]"
61
+ ```
62
+
63
+ # Demo
64
+
65
+ ## Streaming Inference from Images
66
+
67
+ ```bash
68
+ python demo.py --model_path /path/to/checkpoint.pt \
69
+ --image_folder /path/to/images/
70
+ ```
71
+
72
+ ## Streaming Inference from Video
73
+
74
+ ```bash
75
+ python demo.py --model_path /path/to/checkpoint.pt \
76
+ --video_path video.mp4 --fps 10
77
+ ```
78
+
79
+ ## Streaming with Keyframe Interval
80
+
81
+ Use `--keyframe_interval` to reduce KV cache memory by only keeping every N-th frame as a keyframe. Non-keyframe frames still produce predictions but are not stored in the cache. This is useful for long sequences
82
+ which excesses 320 frames.
83
+
84
+ ```bash
85
+ python demo.py --model_path /path/to/checkpoint.pt \
86
+ --image_folder /path/to/images/ --keyframe_interval 6
87
+ ```
88
+
89
+ ## Windowed Inference (for long sequences, >3000 frames)
90
+ ```bash
91
+ python demo.py --model_path /path/to/checkpoint.pt \
92
+ --video_path video.mp4 --fps 10 \
93
+ --mode windowed --window_size 64
94
+ ```
95
+
96
+
97
+ ## With Sky Masking
98
+
99
+ ```bash
100
+ python demo.py --model_path /path/to/checkpoint.pt \
101
+ --image_folder /path/to/images/ --mask_sky
102
+ ```
103
+
104
+ ## Without FlashInfer (SDPA fallback)
105
+
106
+ ```bash
107
+ python demo.py --model_path /path/to/checkpoint.pt \
108
+ --image_folder /path/to/images/ --use_sdpa
109
+ ```
110
+
111
+ # Model Download
112
+
113
+ <!-- TODO: fill in model checkpoints -->
114
+
115
+ | Model Name | Huggingface Repository | Description |
116
+ | :--- | :--- | :--- |
117
+ | lingbot-map | | Base model checkpoint |
118
+
119
+
120
+ # License
121
+
122
+ This project is released under the Apache License 2.0. See [LICENSE](LICENSE.txt) file for details.
123
+
124
+ # Citation
125
+
126
+ ```bibtex
127
+ @article{lingbot-map2026,
128
+ title={},
129
+ author={},
130
+ journal={arXiv preprint arXiv:},
131
+ year={2026}
132
+ }
133
+ ```
134
+
135
+ # Acknowledgments
136
+
137
+ This work builds upon several excellent open-source projects:
138
+
139
+ - [VGGT](https://github.com/facebookresearch/vggt)
140
+ - [DINOv2](https://github.com/facebookresearch/dinov2)
141
+ - [Flashinfer](https://github.com/flashinfer-ai/flashinfer)
142
+
143
+ ---
assets/teaser.png ADDED

Git LFS Details

  • SHA256: d34377bdb2f0747442f3113692914e669e97cb1d474578711cc30d08c5618bcc
  • Pointer size: 132 Bytes
  • Size of remote file: 5.11 MB
lingbot-map.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:986579f63db7bde3cb0f0ecc0a8fd49f5e4b6141a178ac33598d7fbe3e901cd0
3
+ size 4632326476