jankin123 nielsr HF Staff commited on
Commit
8cbfb6a
·
1 Parent(s): 1c9ae24

Update model card: add metadata, library tags, and repository links (#1)

Browse files

- Update model card: add metadata, library tags, and repository links (33bead3343dff0bfb4cbcbebedc7879504d5043b)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +31 -13
README.md CHANGED
@@ -1,15 +1,21 @@
1
  ---
2
  license: apache-2.0
 
 
3
  tags:
4
- - 4DThinker
5
- - dynamic-spatial-reasoning
6
- - vision-language-model
7
- - latent-reasoning
8
  ---
9
 
10
- # 4DThinker Model Checkpoints
11
 
12
- This repository contains the trained model checkpoints from Qwen2.5-VL-3B for **4DThinker**, a framework that enables VLMs to "think with 4D" through dynamic latent mental imagery.
 
 
 
 
13
 
14
  ## Model Structure
15
 
@@ -41,13 +47,13 @@ model/
41
 
42
  ## Special Tokens
43
 
44
- Three special tokens are added to the Qwen2.5-VL vocabulary:
45
 
46
  | Token | Description |
47
  |-------|-------------|
48
- | `<\|latent_pad\|>` | Padding within latent sequences |
49
- | `<\|latent_start\|>` | Marks start of latent visual token block |
50
- | `<\|latent_end\|>` | Marks end of latent visual token block |
51
 
52
  ## Usage
53
 
@@ -55,11 +61,23 @@ Three special tokens are added to the Qwen2.5-VL vocabulary:
55
  from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
56
 
57
  model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
58
- "./model/4drl",
 
59
  torch_dtype="auto",
60
  device_map="auto"
61
  )
62
- processor = AutoProcessor.from_pretrained("./model/4drl")
 
 
 
 
 
 
 
 
 
 
 
63
  ```
64
 
65
  ## Bibtex
@@ -76,4 +94,4 @@ If you find 4DThinker helpful for your work, please cite
76
 
77
  ## License
78
 
79
- Apache License 2.0
 
1
  ---
2
  license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: video-text-to-text
5
  tags:
6
+ - 4DThinker
7
+ - dynamic-spatial-reasoning
8
+ - vision-language-model
9
+ - latent-reasoning
10
  ---
11
 
12
+ # 4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding
13
 
14
+ [**Paper**](https://huggingface.co/papers/2605.05997) | [**Code**](https://github.com/zhangquanchen/4DThinker)
15
+
16
+ 4DThinker is a framework that enables Vision-Language Models (VLMs) to "think with 4D" through dynamic latent mental imagery—internally simulating how scenes evolve within the continuous hidden space. It addresses dynamic spatial reasoning from monocular video by grounding the model in dynamic visual semantics.
17
+
18
+ This repository contains the trained model checkpoints from Qwen2.5-VL-3B for **4DThinker**.
19
 
20
  ## Model Structure
21
 
 
47
 
48
  ## Special Tokens
49
 
50
+ Three special tokens are added to the Qwen2.5-VL vocabulary to support latent imagery:
51
 
52
  | Token | Description |
53
  |-------|-------------|
54
+ | `<|latent_pad|>` | Padding within latent sequences |
55
+ | `<|latent_start|>` | Marks start of latent visual token block |
56
+ | `<|latent_end|>` | Marks end of latent visual token block |
57
 
58
  ## Usage
59
 
 
61
  from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
62
 
63
  model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
64
+ "jankin123/4DThinker-3B",
65
+ subfolder="4drl",
66
  torch_dtype="auto",
67
  device_map="auto"
68
  )
69
+ processor = AutoProcessor.from_pretrained("jankin123/4DThinker-3B", subfolder="4drl")
70
+ ```
71
+
72
+ ## Citation
73
+
74
+ ```bibtex
75
+ @article{4dthinker,
76
+ title={4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding},
77
+ author={Zhang, Quanchen and others},
78
+ journal={arXiv preprint arXiv:2605.05997},
79
+ year={2026}
80
+ }
81
  ```
82
 
83
  ## Bibtex
 
94
 
95
  ## License
96
 
97
+ Apache License 2.0