andrewkang12345
/

policyINR-checkpoints

representation-learning

policy-representation

Model card Files Files and versions

policyINR-checkpoints / README.md

andrewkang12345's picture

andrewkang12345

Upload README.md with huggingface_hub

907b271 verified 5 days ago

|

history blame contribute delete

2.05 kB

	---
	license: mit
	library_name: pytorch
	tags:
	- representation-learning
	- offline-rl
	- policy-representation
	- inr
	- cvae
	- meta-learning
	- mujoco
	- lichess
	- droid
	- fastf1
	- dmlab
	---

	# policyINR — Checkpoints

	Companion checkpoint repository for [andrewkang12345/policyINR](https://github.com/andrewkang12345/policyINR) —
	robust policy representation learning from offline data.

	The folder layout mirrors the `outputs/` tree of the source repo:

	```
	<domain>/<suite>/<run>/best.pt
	last.pt
	```

	where `<domain> ∈ {lichess, droid, fastf1, mujoco, syntheticgrf,
	dmlabseekavoid}`.

	Every per-run dir on the GitHub side carries the matching `config.yaml`,
	`metrics.jsonl`, `summary.json`, and `eval.json` — clone the repo and the
	identifiers line up 1-to-1 with the paths here.

	## Download example

	```python
	from huggingface_hub import snapshot_download
	snapshot_download(
	repo_id="andrewkang12345/policyINR-checkpoints",
	repo_type="model",
	allow_patterns=["lichess/2x_hk240/*/.pt"],
	local_dir="outputs",
	)
	```

	## Domains

	\| domain \| what it is \|
	\|---\|---\|
	\| `lichess` \| Top-3 GM Lichess games — discrete UCI-move action \|
	\| `droid` \| DROID lowdim teleop — continuous action across collectors \|
	\| `fastf1` \| F1 stint telemetry — driver-as-policy \|
	\| `mujoco` \| Custom MuJoCo + Minari baselines + state/action-resampled suites \|
	\| `syntheticgrf` \| Synthetic Gaussian random-field policies (sanity / smoke) \|
	\| `dmlabseekavoid` \| DMLab `seekavoid_arena_01` from RL Unplugged — discrete action \|

	## Models in each run name

	`<data>__<model>__<experiment>__s<seed>/`

	- `cvae` — bag-of-pairs CVAE
	- `inr_transformer_history_conditioned` — INR-Transformer w/ history
	- `inr_diffusion_history_conditioned` — INR-Diffusion w/ history
	- `inr_transformer_fitted_latent` — per-unit fitted latent codes
	- `inr_transformer_infer_latent[_maml]` — meta-learned per-unit latent
	(10-step inner adapt + early stopping)

	## License

	MIT (mirrors the source repo).