File size: 2,045 Bytes
907b271 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | ---
license: mit
library_name: pytorch
tags:
- representation-learning
- offline-rl
- policy-representation
- inr
- cvae
- meta-learning
- mujoco
- lichess
- droid
- fastf1
- dmlab
---
# policyINR β Checkpoints
Companion checkpoint repository for **[andrewkang12345/policyINR](https://github.com/andrewkang12345/policyINR)** β
robust policy representation learning from offline data.
The folder layout mirrors the `outputs/` tree of the source repo:
```
<domain>/<suite>/<run>/best.pt
last.pt
```
where `<domain> β {lichess, droid, fastf1, mujoco, syntheticgrf,
dmlabseekavoid}`.
Every per-run dir on the GitHub side carries the matching `config.yaml`,
`metrics.jsonl`, `summary.json`, and `eval.json` β clone the repo and the
identifiers line up 1-to-1 with the paths here.
## Download example
```python
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="andrewkang12345/policyINR-checkpoints",
repo_type="model",
allow_patterns=["lichess/2x_hk240/**/*.pt"],
local_dir="outputs",
)
```
## Domains
| domain | what it is |
|---|---|
| `lichess` | Top-3 GM Lichess games β discrete UCI-move action |
| `droid` | DROID lowdim teleop β continuous action across collectors |
| `fastf1` | F1 stint telemetry β driver-as-policy |
| `mujoco` | Custom MuJoCo + Minari baselines + state/action-resampled suites |
| `syntheticgrf` | Synthetic Gaussian random-field policies (sanity / smoke) |
| `dmlabseekavoid` | DMLab `seekavoid_arena_01` from RL Unplugged β discrete action |
## Models in each run name
`<data>__<model>__<experiment>__s<seed>/`
- `cvae` β bag-of-pairs CVAE
- `inr_transformer_history_conditioned` β INR-Transformer w/ history
- `inr_diffusion_history_conditioned` β INR-Diffusion w/ history
- `inr_transformer_fitted_latent` β per-unit fitted latent codes
- `inr_transformer_infer_latent[_maml]` β meta-learned per-unit latent
(10-step inner adapt + early stopping)
## License
MIT (mirrors the source repo).
|