File size: 2,330 Bytes
d0c8458
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
license: mit
tags:
  - reinforcement-learning
  - offline-rl
  - flow-matching
  - robotics
  - jax
datasets:
  - ogbench
  - robomimic
---

# Flow Map Policies — Pretrained FMQ Checkpoints

Pretrained checkpoints for **"Aligning Flow Map Policies with Optimal Q-Guidance"**.

**Paper:** [arXiv:2605.12416](https://arxiv.org/abs/2605.12416)  
**Code:** [github.com/christoszi/flow-map-policies](https://github.com/christoszi/flow-map-policies)

## Model Description

These are Flow Map Q-Guidance (FMQ) agents trained with offline-to-online RL. Each checkpoint contains a flow map policy fine-tuned online for 1M steps using critic-guided trust-region optimization.

## Checkpoints

12 environments x 5 random seeds = 60 checkpoints total.

| Folder | Environment | Benchmark |
|--------|-------------|-----------|
| `checkpoints/ctrp4/` | cube-triple-play-singletask-task4-v0 | OGBench |
| `checkpoints/ctrp3/` | cube-triple-play-singletask-task3-v0 | OGBench |
| `checkpoints/cdp4/` | cube-double-play-singletask-task4-v0 | OGBench |
| `checkpoints/cdp3/` | cube-double-play-singletask-task3-v0 | OGBench |
| `checkpoints/sc4/` | scene-play-singletask-task4-v0 | OGBench |
| `checkpoints/sc5/` | scene-play-singletask-task5-v0 | OGBench |
| `checkpoints/ag4/` | antmaze-giant-navigate-singletask-task4-v0 | OGBench |
| `checkpoints/ag5/` | antmaze-giant-navigate-singletask-task5-v0 | OGBench |
| `checkpoints/hm3/` | humanoidmaze-medium-navigate-singletask-task3-v0 | OGBench |
| `checkpoints/hm4/` | humanoidmaze-medium-navigate-singletask-task4-v0 | OGBench |
| `checkpoints/can/` | can-mh-low_dim | RoboMimic |
| `checkpoints/square/` | square-mh-low_dim | RoboMimic |

## Usage

```bash
pip install huggingface_hub
python -c "from huggingface_hub import snapshot_download; snapshot_download('christoszi/flow-map-policies', local_dir='.')"
```

Then evaluate:

```bash
python main.py --config configs/config.yaml \
  --eval_only --fmq_online \
  --restore_path=checkpoints/ctrp4/params_online_sd000.pkl \
  --env_name=cube-triple-play-singletask-task4-v0 --seed=0
```

## Citation

```bibtex
@article{ziakas2026fmq,
  title={Aligning Flow Map Policies with Optimal Q-Guidance},
  author={Ziakas, Christos and Russo, Alessandra and Bose, Avishek Joey},
  journal={arXiv preprint arXiv:2605.12416},
  year={2026},
}
```