Aligning Flow Map Policies with Optimal Q-Guidance
Paper • 2605.12416 • Published
Pretrained checkpoints for "Aligning Flow Map Policies with Optimal Q-Guidance".
Paper: arXiv:2605.12416
Code: github.com/christoszi/flow-map-policies
These are Flow Map Q-Guidance (FMQ) agents trained with offline-to-online RL. Each checkpoint contains a flow map policy fine-tuned online for 1M steps using critic-guided trust-region optimization.
12 environments x 5 random seeds = 60 checkpoints total.
| Folder | Environment | Benchmark |
|---|---|---|
checkpoints/ctrp4/ |
cube-triple-play-singletask-task4-v0 | OGBench |
checkpoints/ctrp3/ |
cube-triple-play-singletask-task3-v0 | OGBench |
checkpoints/cdp4/ |
cube-double-play-singletask-task4-v0 | OGBench |
checkpoints/cdp3/ |
cube-double-play-singletask-task3-v0 | OGBench |
checkpoints/sc4/ |
scene-play-singletask-task4-v0 | OGBench |
checkpoints/sc5/ |
scene-play-singletask-task5-v0 | OGBench |
checkpoints/ag4/ |
antmaze-giant-navigate-singletask-task4-v0 | OGBench |
checkpoints/ag5/ |
antmaze-giant-navigate-singletask-task5-v0 | OGBench |
checkpoints/hm3/ |
humanoidmaze-medium-navigate-singletask-task3-v0 | OGBench |
checkpoints/hm4/ |
humanoidmaze-medium-navigate-singletask-task4-v0 | OGBench |
checkpoints/can/ |
can-mh-low_dim | RoboMimic |
checkpoints/square/ |
square-mh-low_dim | RoboMimic |
pip install huggingface_hub
python -c "from huggingface_hub import snapshot_download; snapshot_download('christoszi/flow-map-policies', local_dir='.')"
Then evaluate:
python main.py --config configs/config.yaml \
--eval_only --fmq_online \
--restore_path=checkpoints/ctrp4/params_online_sd000.pkl \
--env_name=cube-triple-play-singletask-task4-v0 --seed=0
@article{ziakas2026fmq,
title={Aligning Flow Map Policies with Optimal Q-Guidance},
author={Ziakas, Christos and Russo, Alessandra and Bose, Avishek Joey},
journal={arXiv preprint arXiv:2605.12416},
year={2026},
}