mgovind7 commited on
Commit
9187d73
·
2 Parent(s): e09a9e8dd53ecd

Merge branch 'main' of https://huggingface.co/mgovind7/UniLACT

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - robot manipulation
4
+ - multi-modal perception
5
+ - vision-language-action
6
+ ---
7
+
8
+ # UniLACT
9
+
10
+ UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models.
11
+
12
+ ## Abstract
13
+ Latent action representations learned from unlabeled videos have recently emerged as a promising paradigm for
14
+ pretraining vision-language-action (VLA) models without explicit robot action supervision. However, latent actions derived
15
+ solely from RGB observations primarily encode appearancedriven dynamics and lack explicit 3D geometric structure,
16
+ which is essential for precise and contact-rich manipulation. To address this limitation, we introduce UNILACT, a
17
+ transformer-based VLA model that incorporates geometric
18
+ structure through depth-aware latent pretraining, enabling
19
+ downstream policies to inherit stronger spatial priors. To facilitate this process, we propose UNILARN, a unified latent action
20
+ learning framework based on inverse and forward dynamics
21
+ objectives that learns a shared embedding space for RGB and
22
+ depth while explicitly modeling their cross-modal interactions.
23
+ This formulation produces modality-specific and unified latent
24
+ action representations that serve as pseudo-labels for the depthaware pretraining of UNILACT. Extensive experiments in both
25
+ simulation and real-world settings demonstrate the effectiveness
26
+ of depth-aware unified latent action representations. UNILACT
27
+ consistently outperforms RGB-based latent action baselines
28
+ under in-domain and out-of-domain pretraining regimes, as
29
+ well as on both seen and unseen manipulation tasks.
30
+
31
+
32
+ ## Citation
33
+
34
+ ```bibtex
35
+ @misc{govind2026unilactdepthawarergblatent,
36
+ title={UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models},
37
+ author={Manish Kumar Govind and Dominick Reilly and Pu Wang and Srijan Das},
38
+ year={2026},
39
+ eprint={2602.20231},
40
+ archivePrefix={arXiv},
41
+ primaryClass={cs.RO},
42
+ url={https://arxiv.org/abs/2602.20231}
43
+ }