christoszi commited on
Commit
d0c8458
·
verified ·
1 Parent(s): 2661eda

Add pretrained FMQ checkpoints (12 envs x 5 seeds)

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - reinforcement-learning
5
+ - offline-rl
6
+ - flow-matching
7
+ - robotics
8
+ - jax
9
+ datasets:
10
+ - ogbench
11
+ - robomimic
12
+ ---
13
+
14
+ # Flow Map Policies — Pretrained FMQ Checkpoints
15
+
16
+ Pretrained checkpoints for **"Aligning Flow Map Policies with Optimal Q-Guidance"**.
17
+
18
+ **Paper:** [arXiv:2605.12416](https://arxiv.org/abs/2605.12416)
19
+ **Code:** [github.com/christoszi/flow-map-policies](https://github.com/christoszi/flow-map-policies)
20
+
21
+ ## Model Description
22
+
23
+ These are Flow Map Q-Guidance (FMQ) agents trained with offline-to-online RL. Each checkpoint contains a flow map policy fine-tuned online for 1M steps using critic-guided trust-region optimization.
24
+
25
+ ## Checkpoints
26
+
27
+ 12 environments x 5 random seeds = 60 checkpoints total.
28
+
29
+ | Folder | Environment | Benchmark |
30
+ |--------|-------------|-----------|
31
+ | `checkpoints/ctrp4/` | cube-triple-play-singletask-task4-v0 | OGBench |
32
+ | `checkpoints/ctrp3/` | cube-triple-play-singletask-task3-v0 | OGBench |
33
+ | `checkpoints/cdp4/` | cube-double-play-singletask-task4-v0 | OGBench |
34
+ | `checkpoints/cdp3/` | cube-double-play-singletask-task3-v0 | OGBench |
35
+ | `checkpoints/sc4/` | scene-play-singletask-task4-v0 | OGBench |
36
+ | `checkpoints/sc5/` | scene-play-singletask-task5-v0 | OGBench |
37
+ | `checkpoints/ag4/` | antmaze-giant-navigate-singletask-task4-v0 | OGBench |
38
+ | `checkpoints/ag5/` | antmaze-giant-navigate-singletask-task5-v0 | OGBench |
39
+ | `checkpoints/hm3/` | humanoidmaze-medium-navigate-singletask-task3-v0 | OGBench |
40
+ | `checkpoints/hm4/` | humanoidmaze-medium-navigate-singletask-task4-v0 | OGBench |
41
+ | `checkpoints/can/` | can-mh-low_dim | RoboMimic |
42
+ | `checkpoints/square/` | square-mh-low_dim | RoboMimic |
43
+
44
+ ## Usage
45
+
46
+ ```bash
47
+ pip install huggingface_hub
48
+ python -c "from huggingface_hub import snapshot_download; snapshot_download('christoszi/flow-map-policies', local_dir='.')"
49
+ ```
50
+
51
+ Then evaluate:
52
+
53
+ ```bash
54
+ python main.py --config configs/config.yaml \
55
+ --eval_only --fmq_online \
56
+ --restore_path=checkpoints/ctrp4/params_online_sd000.pkl \
57
+ --env_name=cube-triple-play-singletask-task4-v0 --seed=0
58
+ ```
59
+
60
+ ## Citation
61
+
62
+ ```bibtex
63
+ @article{ziakas2026fmq,
64
+ title={Aligning Flow Map Policies with Optimal Q-Guidance},
65
+ author={Ziakas, Christos and Russo, Alessandra and Bose, Avishek Joey},
66
+ journal={arXiv preprint arXiv:2605.12416},
67
+ year={2026},
68
+ }
69
+ ```