| --- |
| license: mit |
| tags: |
| - reinforcement-learning |
| - offline-rl |
| - flow-matching |
| - robotics |
| - jax |
| datasets: |
| - ogbench |
| - robomimic |
| --- |
| |
| # Flow Map Policies — Pretrained FMQ Checkpoints |
|
|
| Pretrained checkpoints for **"Aligning Flow Map Policies with Optimal Q-Guidance"**. |
|
|
| **Paper:** [arXiv:2605.12416](https://arxiv.org/abs/2605.12416) |
| **Code:** [github.com/christoszi/flow-map-policies](https://github.com/christoszi/flow-map-policies) |
|
|
| ## Model Description |
|
|
| These are Flow Map Q-Guidance (FMQ) agents trained with offline-to-online RL. Each checkpoint contains a flow map policy fine-tuned online for 1M steps using critic-guided trust-region optimization. |
|
|
| ## Checkpoints |
|
|
| 12 environments x 5 random seeds = 60 checkpoints total. |
|
|
| | Folder | Environment | Benchmark | |
| |--------|-------------|-----------| |
| | `checkpoints/ctrp4/` | cube-triple-play-singletask-task4-v0 | OGBench | |
| | `checkpoints/ctrp3/` | cube-triple-play-singletask-task3-v0 | OGBench | |
| | `checkpoints/cdp4/` | cube-double-play-singletask-task4-v0 | OGBench | |
| | `checkpoints/cdp3/` | cube-double-play-singletask-task3-v0 | OGBench | |
| | `checkpoints/sc4/` | scene-play-singletask-task4-v0 | OGBench | |
| | `checkpoints/sc5/` | scene-play-singletask-task5-v0 | OGBench | |
| | `checkpoints/ag4/` | antmaze-giant-navigate-singletask-task4-v0 | OGBench | |
| | `checkpoints/ag5/` | antmaze-giant-navigate-singletask-task5-v0 | OGBench | |
| | `checkpoints/hm3/` | humanoidmaze-medium-navigate-singletask-task3-v0 | OGBench | |
| | `checkpoints/hm4/` | humanoidmaze-medium-navigate-singletask-task4-v0 | OGBench | |
| | `checkpoints/can/` | can-mh-low_dim | RoboMimic | |
| | `checkpoints/square/` | square-mh-low_dim | RoboMimic | |
|
|
| ## Usage |
|
|
| ```bash |
| pip install huggingface_hub |
| python -c "from huggingface_hub import snapshot_download; snapshot_download('christoszi/flow-map-policies', local_dir='.')" |
| ``` |
|
|
| Then evaluate: |
|
|
| ```bash |
| python main.py --config configs/config.yaml \ |
| --eval_only --fmq_online \ |
| --restore_path=checkpoints/ctrp4/params_online_sd000.pkl \ |
| --env_name=cube-triple-play-singletask-task4-v0 --seed=0 |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{ziakas2026fmq, |
| title={Aligning Flow Map Policies with Optimal Q-Guidance}, |
| author={Ziakas, Christos and Russo, Alessandra and Bose, Avishek Joey}, |
| journal={arXiv preprint arXiv:2605.12416}, |
| year={2026}, |
| } |
| ``` |
|
|