--- language: - en license: cc-by-nc-sa-4.0 pipeline_tag: image-to-video tags: - motion-generation - trajectory-prediction - robotics - computer-vision - pytorch - torch-hub --- # ZipMo (Learning Long-term Motion Embeddings for Efficient Kinematics Generation) [![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://compvis.github.io/long-term-motion) [![Paper](https://img.shields.io/badge/arXiv-paper-b31b1b)](https://arxiv.org/abs/2604.11737) [![GitHub](https://img.shields.io/badge/GitHub-Code-black)](https://github.com/CompVis/long-term-motion) [![Venue](https://img.shields.io/badge/CVPR-2026-green)](https://compvis.github.io/long-term-motion) ZipMo is a motion-space model for efficient long-horizon kinematics generation. It learns compact long-term motion embeddings from large-scale tracker-derived trajectories and generates plausible future motion directly in this learned motion space. The model supports spatial-poke conditioning for open-domain videos and task/text-embedding conditioning for LIBERO robotics evaluation. ## Paper and Abstract ZipMo was introduced in the CVPR 2026 paper **Learning Long-term Motion Embeddings for Efficient Kinematics Generation**. Understanding and predicting motion is a fundamental component of visual intelligence. Although video models can synthesize scene dynamics, exploring many possible futures through full video generation is expensive. ZipMo instead operates directly on long-term motion embeddings learned from tracker trajectories, enabling efficient generation of long, realistic motions while preserving dense reconstruction at arbitrary spatial query points. ![ZipMo teaser figure](https://compvis.github.io/long-term-motion/static/images/social_preview.png) *ZipMo generates long-horizon motion in a compact learned motion space, supporting spatial-poke conditioning for open-domain videos and task-conditioned action prediction on LIBERO.* ## Usage For programmatic use, the simplest way to use ZipMo is via `torch.hub`: ```python import torch repo = "CompVis/long-term-motion" # Open-domain motion prediction planner_sparse = torch.hub.load(repo, "zipmo_planner_sparse") planner_dense = torch.hub.load(repo, "zipmo_planner_dense") # Motion autoencoder vae = torch.hub.load(repo, "zipmo_vae") ``` LIBERO planning and policy components can be loaded in the same way: ```python import torch repo = "CompVis/long-term-motion" # LIBERO planners libero_atm_planner = torch.hub.load(repo, "zipmo_planner_libero", "atm") libero_tramoe_planner = torch.hub.load(repo, "zipmo_planner_libero", "tramoe") # LIBERO policy heads policy_head_atm = torch.hub.load(repo, "zipmo_policy_head", "atm") policy_head_tramoe_goal = torch.hub.load(repo, "zipmo_policy_head", "tramoe", "goal") ``` Available Torch Hub entries: - `zipmo_planner_sparse`: sparse-poke planner for open-domain motion prediction. - `zipmo_planner_dense`: dense-conditioning planner for open-domain motion prediction. - `zipmo_vae`: long-term motion autoencoder. - `zipmo_planner_libero`: LIBERO planner with mode `atm` or `tramoe`. - `zipmo_policy_head`: LIBERO policy head with mode `atm` or `tramoe`. For `tramoe`, pass one of `10`, `goal`, `object`, or `spatial`. For the interactive demo, standard track prediction evaluation, LIBERO rollout evaluation, and training instructions, see the [GitHub repository](https://github.com/CompVis/long-term-motion). ## Citation If you find our model or code useful, please cite our paper: ```bibtex @inproceedings{stracke2026motionembeddings, title = {Learning Long-term Motion Embeddings for Efficient Kinematics Generation}, author = {Stracke, Nick and Bauer, Kolja and Baumann, Stefan Andreas and Bautista, Miguel Angel and Susskind, Josh and Ommer, Bj{\"o}rn}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2026} } ```