Mimic Intent, Not Just Trajectories (MINT)
MINT (Mimic Intent, Not just Trajectories) is a framework for end-to-end imitation learning in dexterous manipulation. It explicitly disentangles behavior intent from execution details by learning a hierarchical, multi-scale token representation of actions.
- Paper: Mimic Intent, Not Just Trajectories
- Project Page: https://renming-huang.github.io/MINT/
- Repository: https://github.com/RenMing-Huang/MINT
Overview β¨
MINT addresses the limitations of standard Vision-Language-Action (VLA) models by disentangling behavior intent from execution details via multi-scale frequency-space tokenization. This yields an abstract Intent token for planning and Execution tokens for precise environmental adaptation. The policy generates trajectories through next-scale autoregression, performing progressive intent-to-execution reasoning.
Usage π οΈ
This model is integrated with the LeRobot library. You can evaluate the policy on LIBERO tasks using the following command:
lerobot-eval \
--policy.path=huangrm/MINT-libero \
--policy.vqvae_name_or_path=<path/to/tokenizer> \
--env.type=libero \
--env.task=libero_10,libero_object,libero_spatial,libero_goal \
--eval.batch_size=1 \
--eval.n_episodes=2 \
--seed=42 \
--policy.n_action_steps=4
Note: Replace <path/to/tokenizer> with the local path to the MINT-tokenizer-libero weights.
Citation π
If you find this project useful, please cite:
@article{huang2026mimic,
title={Mimic Intent, Not Just Trajectories},
author={Huang, Renming and Zeng, Chendong and Tang, Wenjing and Cai, Jintian and Lu, Cewu and Cai, Panpan},
journal={arXiv preprint arXiv:2602.08602},
year={2026}
}
- Downloads last month
- 22