File size: 2,045 Bytes
907b271
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
license: mit
library_name: pytorch
tags:
  - representation-learning
  - offline-rl
  - policy-representation
  - inr
  - cvae
  - meta-learning
  - mujoco
  - lichess
  - droid
  - fastf1
  - dmlab
---

# policyINR β€” Checkpoints

Companion checkpoint repository for **[andrewkang12345/policyINR](https://github.com/andrewkang12345/policyINR)** β€”
robust policy representation learning from offline data.

The folder layout mirrors the `outputs/` tree of the source repo:

```
<domain>/<suite>/<run>/best.pt
                        last.pt
```

where `<domain> ∈ {lichess, droid, fastf1, mujoco, syntheticgrf,
dmlabseekavoid}`.

Every per-run dir on the GitHub side carries the matching `config.yaml`,
`metrics.jsonl`, `summary.json`, and `eval.json` β€” clone the repo and the
identifiers line up 1-to-1 with the paths here.

## Download example

```python
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="andrewkang12345/policyINR-checkpoints",
    repo_type="model",
    allow_patterns=["lichess/2x_hk240/**/*.pt"],
    local_dir="outputs",
)
```

## Domains

| domain | what it is |
|---|---|
| `lichess` | Top-3 GM Lichess games β€” discrete UCI-move action |
| `droid` | DROID lowdim teleop β€” continuous action across collectors |
| `fastf1` | F1 stint telemetry β€” driver-as-policy |
| `mujoco` | Custom MuJoCo + Minari baselines + state/action-resampled suites |
| `syntheticgrf` | Synthetic Gaussian random-field policies (sanity / smoke) |
| `dmlabseekavoid` | DMLab `seekavoid_arena_01` from RL Unplugged β€” discrete action |

## Models in each run name

`<data>__<model>__<experiment>__s<seed>/`

- `cvae` β€” bag-of-pairs CVAE
- `inr_transformer_history_conditioned` β€” INR-Transformer w/ history
- `inr_diffusion_history_conditioned` β€” INR-Diffusion w/ history
- `inr_transformer_fitted_latent` β€” per-unit fitted latent codes
- `inr_transformer_infer_latent[_maml]` β€” meta-learned per-unit latent
  (10-step inner adapt + early stopping)

## License

MIT (mirrors the source repo).