MINT: Mimic Intent, Not Just Trajectories

MINT (Mimic Intent, Not just Trajectories) is a framework for end-to-end imitation learning in dexterous manipulation. It explicitly disentangles behavior intent from execution details by learning a hierarchical, multi-scale token representation of actions.

Model Description

While imitation learning has achieved success in robotic manipulation, models often struggle with adaptation and transfer because they mimic raw trajectories. MINT addresses this by disentangling behavior intent from execution details through multi-scale frequency-space tokenization.

  • Intent Tokens: Capture low-frequency global structure to facilitate planning and transfer.
  • Execution Tokens: Encode high-frequency details to enable precise adaptation to environmental dynamics.
  • Autoregressive Reasoning: The policy generates trajectories through next-scale autoregression, performing progressive intent-to-execution reasoning.

Usage

This model is designed to be used with the LeRobot library.

Evaluation

To evaluate the policy in the LIBERO environment, use the following command (requires the MINT-tokenizer-libero):

lerobot-eval \
    --policy.path=huangrm/MINT-libero \
    --policy.vqvae_name_or_path=huangrm/MINT-tokenizer-libero \
    --env.type=libero \
    --env.task=libero_10,libero_object,libero_spatial,libero_goal \
    --eval.batch_size=1 \
    --eval.n_episodes=2 \
    --seed=42 \
    --policy.n_action_steps=4

Citation

@article{huang2026mimic,
  title={Mimic Intent, Not Just Trajectories},
  author={Huang, Renming and Zeng, Chendong and Tang, Wenjing and Cai, Jintian and Lu, Cewu and Cai, Panpan},
  journal={arXiv preprint arXiv:2602.08602},
  year={2026}
}
Downloads last month
177
Safetensors
Model size
4B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Paper for huangrm/MINT-libero