Robotics
Transformers
Safetensors
English
alpamayo_r1
ToshMarley17 BorisIvanovic commited on
Commit
8148d2b
·
0 Parent(s):

Duplicate from nvidia/Alpamayo-R1-10B

Browse files

Co-authored-by: Boris Ivanovic <BorisIvanovic@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ NVIDIA License
2
+
3
+ 1. Definitions
4
+
5
+ “Licensor” means any person or entity that distributes its Work.
6
+ “Work” means (a) the original work of authorship made available under this license, which may include software, documentation, or other files, and (b) any additions to or derivative works thereof that are made available under this license.
7
+ The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the meaning as provided under U.S. copyright law; provided, however, that for the purposes of this license, derivative works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work.
8
+ Works are “made available” under this license by including in or with the Work either (a) a copyright notice referencing the applicability of this license to the Work, or (b) a copy of this license.
9
+
10
+ 2. License Grant
11
+
12
+ 2.1 Copyright Grant. Subject to the terms and conditions of this license, each Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free, copyright license to use, reproduce, prepare derivative works of, publicly display, publicly perform, sublicense and distribute its Work and any resulting derivative works in any form.
13
+
14
+ 3. Limitations
15
+
16
+ 3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under this license, (b) you include a complete copy of this license with your distribution, and (c) you retain without modification any copyright, patent, trademark, or attribution notices that are present in the Work.
17
+
18
+ 3.2 Derivative Works. You may specify that additional or different terms apply to the use, reproduction, and distribution of your derivative works of the Work (“Your Terms”) only if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative works, and (b) you identify the specific derivative works that are subject to Your Terms. Notwithstanding Your Terms, this license (including the redistribution requirements in Section 3.1) will continue to apply to the Work itself.
19
+
20
+ 3.3 Use Limitation. The Work and any derivative works thereof only may be used or intended for use non-commercially. Notwithstanding the foregoing, NVIDIA Corporation and its affiliates may use the Work and any derivative works commercially. As used herein, “non-commercially” means for research or evaluation purposes only.
21
+
22
+ 3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor (including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then your rights under this license from such Licensor (including the grant in Section 2.1) will terminate immediately.
23
+
24
+ 3.5 Trademarks. This license does not grant any rights to use any Licensor’s or its affiliates’ names, logos, or trademarks, except as necessary to reproduce the notices described in this license.
25
+
26
+ 3.6 Termination. If you violate any term of this license, then your rights under this license (including the grant in Section 2.1) will terminate immediately.
27
+
28
+ 4. AI Ethics.
29
+ Use of the Work under this Agreement must be consistent with NVIDIA’s Trustworthy AI terms at https://www.nvidia.com/en-us/agreements/trustworthy-ai/terms/
30
+
31
+ 5. Disclaimer of Warranty.
32
+
33
+ THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
34
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE.
35
+
36
+ 6. Limitation of Liability.
37
+
38
+ EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
README.md ADDED
@@ -0,0 +1,204 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - nvidia/PhysicalAI-Autonomous-Vehicles
4
+ - nvidia/PhysicalAI-Autonomous-Vehicles-NuRec
5
+ pipeline_tag: robotics
6
+ library_name: transformers
7
+ license: other
8
+ language:
9
+ - en
10
+ new_version: nvidia/Alpamayo-1.5-10B
11
+ ---
12
+
13
+ # Alpamayo 1
14
+
15
+ [**Code**](https://github.com/NVlabs/alpamayo) | [**Paper**](https://arxiv.org/abs/2511.00088)
16
+
17
+ _Note: Following the release of [NVIDIA Alpamayo](https://nvidianews.nvidia.com/news/alpamayo-autonomous-vehicle-development) at CES 2026, Alpamayo-R1 has been renamed to Alpamayo 1._
18
+
19
+ ## Model Overview
20
+
21
+ ### Description:
22
+
23
+ Alpamayo 1 integrates Chain-of-Causation reasoning with trajectory planning to enhance decision-making in complex autonomous-driving scenarios. Alpamayo 1 (v1.0) was developed by NVIDIA as a vision-language-action (VLA) model that bridges interpretable reasoning with precise vehicle control for autonomous-driving applications.
24
+
25
+ This model is ready for non-commercial use. Commercial licensing available upon request.
26
+
27
+ ### License:
28
+
29
+ The model weights are released under a [non-commercial license](./LICENSE).
30
+
31
+ The inference code is released under the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) license.
32
+
33
+ ### Deployment Geography:
34
+
35
+ Global
36
+
37
+ ### Use Case:
38
+
39
+ Researchers and autonomous-driving practitioners who are developing and evaluating VLA models for autonomous-driving scenarios, particularly for handling rare, long-tail events.
40
+
41
+ ### Release Date:
42
+
43
+ Hugging Face 12/03/2025 via this repository.
44
+
45
+ ### Inference Code:
46
+
47
+ GitHub: https://github.com/NVlabs/alpamayo
48
+
49
+ ## Reference:
50
+
51
+ [Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail](https://arxiv.org/abs/2511.00088)
52
+
53
+ ## Model Architecture:
54
+
55
+ **Architecture Type:** Transformer
56
+
57
+ **Network Architecture:** A VLA model based on Cosmos-Reason and featuring a diffusion-based trajectory decoder.
58
+
59
+ **This model was developed based on:** Cosmos-Reason (VLM backbone) with a diffusion-based action decoder
60
+
61
+ **Number of model parameters:**
62
+
63
+ - Backbone: 8.2B parameters
64
+ - Action Expert: 2.3B parameters
65
+
66
+ ## Input(s):
67
+
68
+ **Input Type(s):** Image/Video, Text, Egomotion History
69
+
70
+ **Input Format(s):**
71
+
72
+ - Image: Red, Green, Blue (RGB)
73
+ - Text: String
74
+ - Egomotion History: Floating-point values `(x, y, z), R_rot`
75
+
76
+ **Input Parameters:**
77
+
78
+ - Image: Two-dimensional (2D), multi-camera, multi-timestep
79
+ - Text: One-dimensional (1D)
80
+ - Egomotion History: Three-dimensional (3D) translation and nine-dimensional (9D, 3x3) rotation, multi-timestep
81
+
82
+ **Other Properties Related to Input:**
83
+ Multi-camera images (4 cameras: front-wide, front-tele, cross-left, cross-right) with 0.4 second history window at 10Hz (4 frames per camera), image resolution 1080x1920 pixels (processor will downsample them to 320x576 pixels). Text inputs include user commands. Images and egomotion history (16 waypoints at 10Hz) also require associated timestamps.
84
+ Note that the model is primarily trained and only tested under this setting.
85
+
86
+ ## Output(s)
87
+
88
+ **Output Type(s):** Text, Trajectory
89
+
90
+ **Output Format(s):**
91
+
92
+ - Text: String (Chain-of-Causation reasoning traces)
93
+ - Trajectory: Floating-point values `(x, y, z), R_rot`
94
+
95
+ **Output Parameters:**
96
+
97
+ - Text: One-dimensional (1D)
98
+ - Trajectory: Three-dimensional (3D) translation and nine-dimensional (9D, 3x3) rotation, multi-timestep
99
+
100
+ **Other Properties Related to Output:**
101
+ Outputs 6.4-second future trajectory (64 waypoints at 10Hz) with position `(x, y, z)` and rotation matrix `R_rot` in ego vehicle coordinate frame.
102
+ Internally, the trajectory is represented as a sequence of dynamic actions (acceleration and curvature) following a unicycle model in bird's-eye-view (BEV) space.
103
+ Text reasoning traces are variable in length, describing driving decisions and causal factors.
104
+
105
+ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
106
+
107
+ ## Software Integration:
108
+
109
+ **Runtime Engine(s):**
110
+
111
+ - PyTorch (minimum version: 2.8)
112
+ - Hugging Face Transformers (minimum version: 4.57.1)
113
+ - DeepSpeed (minimum version: 0.17.4)
114
+
115
+ **Supported Hardware Microarchitecture Compatibility:**
116
+
117
+ - NVIDIA GPUs with sufficient memory to load a 10B parameter model (minimum 1 GPU with at least 24GB of VRAM)
118
+
119
+ **Preferred/Supported Operating System(s):**
120
+
121
+ - Linux (we have not tested on other operating systems)
122
+
123
+ The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
124
+
125
+ ## Model Version(s):
126
+
127
+ Alpamayo 1 10B v1.0 trained
128
+
129
+ Can be integrated into autonomous driving software in the cloud for advanced end-to-end perception, reasoning, and motion planning.
130
+
131
+ ## Training, Testing, and Evaluation Datasets:
132
+
133
+ ## Training Dataset:
134
+
135
+ Alpamayo 1's training data comprises a mix of Chain of Causation (CoC) reasoning traces, Cosmos-Reason Physical AI datasets, and NVIDIA's internal proprietary autonomous driving data.
136
+
137
+ **Data Modality:**
138
+
139
+ - Image (multi-camera)
140
+ - Text (reasoning traces)
141
+ - Other: Trajectory data (egomotion, future waypoints)
142
+
143
+ **Image Training Data Size:** More than 1 Billion Images (from 80,000 hours of multi-camera driving data)
144
+
145
+ **Text Training Data Size:** Less than a Billion Tokens (700K CoC reasoning traces plus Cosmos-Reason training data)
146
+
147
+ **Video Training Data Size:** 10,000 to 1 Million Hours (80,000 hours)
148
+
149
+ **Non-Audio, Image, Text Training Data Size:** Trajectory data: 80,000 hours at 10Hz sampling rate
150
+
151
+ **Data Collection Method by dataset:** Hybrid: Automatic/Sensors (camera and vehicle sensors), Synthetic (VLM-generated reasoning)
152
+
153
+ **Labeling Method by dataset:** Hybrid: Human (structured CoC annotations), Automated (VLM-based auto-labeling), Automatic/Sensors (trajectory and egomotion)
154
+
155
+ **Properties:**
156
+ The dataset comprises 80,000 hours of multi-camera driving videos with corresponding egomotion and trajectory annotations.
157
+ It includes 700,000 Chain-of-Causation (CoC) reasoning traces that provide decision-grounded, causally linked explanations of driving behaviors.
158
+ Content includes machine-generated data from vehicle sensors (cameras, IMUs, and GPS) and synthetic reasoning traces.
159
+ CoC annotations are in English and use a structured format that links driving decisions to causal factors.
160
+ Sensors include RGB cameras (2-6 per vehicle), inertial measurement units, and GPS.
161
+
162
+ ### Testing Dataset:
163
+
164
+ **Link:** Proprietary autonomous driving test datasets, closed-loop simulation, on-vehicle road tests.
165
+
166
+ **Data Collection Method by dataset:** Hybrid: Automatic/Sensors (real-world driving data), Synthetic (simulation scenarios)
167
+
168
+ **Labeling Method by dataset:** Hybrid: Automatic/Sensors, Human (ground truth verification)
169
+
170
+ **Properties:**
171
+ This dataset covers multi-camera driving scenarios with a particular focus on rare, long-tail events. It includes challenging cases such as complex intersections, cut-ins, pedestrian interactions, and adverse weather conditions. Data are collected from RGB cameras and vehicle sensors.
172
+
173
+ ### Evaluation Dataset:
174
+
175
+ **Link:** Same as Testing Dataset.
176
+
177
+ **Data Collection Method by dataset:** Hybrid: Automatic/Sensors (real-world driving data), Synthetic (simulation scenarios)
178
+
179
+ **Labeling Method by dataset:** Hybrid: Automatic/Sensors, Human (ground truth verification)
180
+
181
+ **Properties:**
182
+ Evaluation focuses on rare, long-tail scenarios, including complex intersections, pedestrian crossings, vehicle cut-ins, and challenging weather and lighting conditions. Multi-camera sensor data are collected from RGB cameras.
183
+
184
+ **Quantitative Evaluation Benchmarks:**
185
+
186
+ - Closed-Loop Evaluation using [AlpaSim](https://github.com/NVlabs/alpasim) on 910 scenarios from the [PhysicalAI-AV-NuRec Dataset](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NuRec): AlpaSim Score of 0.73 ± 0.01.
187
+ - Open-Loop Evaluation on 937 challenging samples from the [PhysicalAI-AV Dataset](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles): minADE_6 at 6.4s of 1.22m.
188
+
189
+ # Inference:
190
+
191
+ **Acceleration Engine:** PyTorch, Hugging Face Transformers
192
+
193
+ **Test Hardware:**
194
+
195
+ - Minimum: 1 GPU with 24GB+ VRAM (e.g., NVIDIA RTX 3090, RTX 3090 Ti, RTX 4090, A5000, or equivalent)
196
+ - Tested on: NVIDIA H100
197
+
198
+ For scripts related to model inference, please check out our [code repository](https://github.com/NVlabs/alpamayo).
199
+
200
+ ## Ethical Considerations:
201
+
202
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
203
+
204
+ Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).
config.json ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "action_in_proj_cfg": {
3
+ "_target_": "alpamayo_r1.models.action_in_proj.PerWaypointActionInProjV2",
4
+ "hidden_size": 512,
5
+ "max_freq": 100.0,
6
+ "num_enc_layers": 2,
7
+ "num_fourier_feats": 20
8
+ },
9
+ "action_out_proj_cfg": {
10
+ "_target_": "torch.nn.Linear"
11
+ },
12
+ "action_space_cfg": {
13
+ "_target_": "alpamayo_r1.action_space.UnicycleAccelCurvatureActionSpace",
14
+ "a_lambda": 0.0001,
15
+ "a_ridge": 0.0001,
16
+ "accel_bounds": [
17
+ -9.8,
18
+ 9.8
19
+ ],
20
+ "accel_mean": 0.02902694707164455,
21
+ "accel_std": 0.6810426736454882,
22
+ "curvature_bounds": [
23
+ -0.33,
24
+ 0.33
25
+ ],
26
+ "curvature_mean": 0.0002692167976330542,
27
+ "curvature_std": 0.026148280660833106,
28
+ "dt": 0.1,
29
+ "kappa_lambda": 0.0001,
30
+ "kappa_ridge": 0.0001,
31
+ "n_waypoints": 64,
32
+ "theta_lambda": 1e-06,
33
+ "theta_ridge": 1e-08,
34
+ "v_lambda": 1e-06,
35
+ "v_ridge": 0.0001
36
+ },
37
+ "add_special_tokens": true,
38
+ "architectures": [
39
+ "AlpamayoR1"
40
+ ],
41
+ "attn_implementation": "flash_attention_2",
42
+ "diffusion_cfg": {
43
+ "_target_": "alpamayo_r1.diffusion.flow_matching.FlowMatching",
44
+ "int_method": "euler",
45
+ "x_dims": "???"
46
+ },
47
+ "dtype": "bfloat16",
48
+ "expert_cfg": {
49
+ "dtype": "bfloat16",
50
+ "head_dim": 128,
51
+ "hidden_size": 2048,
52
+ "intermediate_size": 8256,
53
+ "num_attention_heads": 16
54
+ },
55
+ "expert_non_causal_attention": true,
56
+ "hist_traj_tokenizer_cfg": {
57
+ "_target_": "alpamayo_r1.models.delta_tokenizer.DeltaTrajectoryTokenizer"
58
+ },
59
+ "keep_same_dtype": true,
60
+ "max_pixels": 196608,
61
+ "min_pixels": 163840,
62
+ "model_dtype": "bfloat16",
63
+ "model_type": "alpamayo_r1",
64
+ "tokens_per_future_traj": 128,
65
+ "tokens_per_history_traj": 48,
66
+ "traj_token_ids": {
67
+ "future": 155685,
68
+ "future_end": 155683,
69
+ "future_start": 155681,
70
+ "history": 155684,
71
+ "history_end": 155676,
72
+ "history_start": 155674
73
+ },
74
+ "traj_token_start_idx": 151669,
75
+ "traj_tokenizer_cfg": {
76
+ "_recursive_": false,
77
+ "_target_": "alpamayo_r1.action_space.discrete_action_space.DiscreteTrajectoryTokenizer",
78
+ "action_space_cfg": {
79
+ "_target_": "alpamayo_r1.action_space.UnicycleAccelCurvatureActionSpace",
80
+ "a_lambda": 0.0001,
81
+ "a_ridge": 0.0001,
82
+ "accel_bounds": [
83
+ -9.8,
84
+ 9.8
85
+ ],
86
+ "accel_mean": 0.02902694707164455,
87
+ "accel_std": 0.6810426736454882,
88
+ "curvature_bounds": [
89
+ -0.33,
90
+ 0.33
91
+ ],
92
+ "curvature_mean": 0.0002692167976330542,
93
+ "curvature_std": 0.026148280660833106,
94
+ "dt": 0.1,
95
+ "kappa_lambda": 0.0001,
96
+ "kappa_ridge": 0.0001,
97
+ "n_waypoints": 64,
98
+ "theta_lambda": 1e-06,
99
+ "theta_ridge": 1e-08,
100
+ "v_lambda": 1e-06,
101
+ "v_ridge": 0.0001
102
+ },
103
+ "dims_max": [
104
+ 10,
105
+ 10
106
+ ],
107
+ "dims_min": [
108
+ -10,
109
+ -10
110
+ ],
111
+ "num_bins": 3000
112
+ },
113
+ "traj_vocab_size": 4000,
114
+ "transformers_version": "4.57.1",
115
+ "vlm_backend": "qwenvl3",
116
+ "vocab_size": 155697
117
+ }
model-00001-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6aabd8d143cff0295a60b515dfcb5ba6a5b1b5acf7cea1d6c6254ed653d35965
3
+ size 4928204944
model-00002-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e346fcf2bf4ebd75853bd17b5744c4158cf621aba195aad7c7e2f1e484c1ae20
3
+ size 4915963032
model-00003-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7acad9950402f002825aa41048155ea1a8a2fb5f7a00501154d52828291171c9
3
+ size 4983071160
model-00004-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59f3ec7a6983ae2654b7ffe5d79f59d3d523b8a71a0ffc83362816886636ac0a
3
+ size 4980341192
model-00005-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5753b1fd57638b70db4882cc8061eedfa28c2748a37414eb4032e20b1bb2e4c1
3
+ size 2349614880
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff