Mimic Intent, Not Just Trajectories (MINT)

MINT (Mimic Intent, Not just Trajectories) is a framework for end-to-end imitation learning in dexterous manipulation. It explicitly disentangles behavior intent from execution details by learning a hierarchical, multi-scale token representation of actions.

Overview ✨

MINT addresses the limitations of standard Vision-Language-Action (VLA) models by disentangling behavior intent from execution details via multi-scale frequency-space tokenization. This yields an abstract Intent token for planning and Execution tokens for precise environmental adaptation. The policy generates trajectories through next-scale autoregression, performing progressive intent-to-execution reasoning.

Usage πŸ› οΈ

This model is integrated with the LeRobot library. You can evaluate the policy on LIBERO tasks using the following command:

lerobot-eval \
    --policy.path=huangrm/MINT-libero \
    --policy.vqvae_name_or_path=<path/to/tokenizer> \
    --env.type=libero \
    --env.task=libero_10,libero_object,libero_spatial,libero_goal \
    --eval.batch_size=1 \
    --eval.n_episodes=2 \
    --seed=42 \
    --policy.n_action_steps=4

Note: Replace <path/to/tokenizer> with the local path to the MINT-tokenizer-libero weights.

Citation πŸ“š

If you find this project useful, please cite:

@article{huang2026mimic,
  title={Mimic Intent, Not Just Trajectories},
  author={Huang, Renming and Zeng, Chendong and Tang, Wenjing and Cai, Jintian and Lu, Cewu and Cai, Panpan},
  journal={arXiv preprint arXiv:2602.08602},
  year={2026}
}
Downloads last month
22
Video Preview
loading

Paper for huangrm/MINT-tokenizer-libero