# Codex Context — ReasoningEconomicsEnv

## Project

- Repo root: `/Users/andrew/Mac/RL Research`
- GitHub repo: `git@github.com:laraandrew/reasoningeconomicsenv.git`
- Active branch: `polish-and-deploy`
- Hugging Face Space: `landrew9/CollabReasoning`
- Package: `reasonbudget_gym`
- Goal: RL environment for token-budget allocation, competition submission, Docker-based HF Space deployment

## Remotes

- `origin`: `git@github.com:laraandrew/reasoningeconomicsenv.git`
- `hf`: `https://huggingface.co/spaces/landrew9/CollabReasoning`

## Current State

- `main` and `polish-and-deploy` originally pointed to the same base commit.
- Work on `polish-and-deploy` is pushed to GitHub through commit `efdc42b`.
- The shipped cache works:
  - `CachedSolver(EnvConfig())._cache` loads 500 entries.
- The environment now defaults to an offline-safe path for cached runs:
  - `EpisodeSampler` uses deterministic bundled questions when the cached solver is active.
- Real question embeddings are enabled and cached at:
  - `reasonbudget_gym/data/embeddings.npy`
- README now contains measured evaluation metrics and embedded plot assets.
- CI exists at `.github/workflows/ci.yml`.
- Dockerfile was slimmed to a runtime-only serving image suitable for HF Spaces.
- The Hugging Face Space repo was force-updated from a clean temporary clone because
  Hugging Face rejected the branch's historical raw binary blobs.
- The live Space is currently:
  - Hub page: `https://huggingface.co/spaces/landrew9/CollabReasoning`
  - Host: `https://landrew9-collabreasoning.hf.space`
  - Runtime stage: `RUNNING`
  - Health endpoint: `/health`
  - Root path originally returned `404`; a landing page at `/` was then added in `server/app.py`

## Local Tooling

- Hugging Face CLI installed globally via the official installer.
- Binary path: `/Users/andrew/.local/bin/hf`
- Reported version at install time: `1.8.0`
- Installer added `/Users/andrew/.local/bin` to `/Users/andrew/.zshrc`
- `git-lfs` and `git-xet` are installed and initialized globally.
- `.gitattributes` now tracks:
  - `docs/*.png`
  - `reasonbudget_gym/data/*.npy`

## Verified Commands

- Tests:
  - `.venv/bin/python -m pytest reasonbudget_gym/tests/ -v`
  - Result: `8 passed`
- Eval:
  - `.venv/bin/python -m reasonbudget_gym.eval.evaluate --n_episodes 50 --seed 42 --output eval_results.json`
- Plot generation:
  - `.venv/bin/python -c "from reasonbudget_gym.eval.plots import agent_comparison, budget_pacing; agent_comparison('eval_results.json', 'docs/agent_comparison.png'); budget_pacing('eval_results.json', 'docs/budget_pacing.png')"`
- PPO smoke test:
  - `.venv/bin/python -m reasonbudget_gym.training.ppo_train --n_episodes 100 --output_dir runs/smoke`
  - Completed successfully and wrote checkpoints.
- Docker:
  - `docker build -t reasoning-economic-env .`
  - `docker run -d -p 8000:8000 --name reasoning-economic-env-test reasoning-economic-env`
  - `curl http://127.0.0.1:8000/health`
  - Result: `{"status":"ok","env":"ReasonBudgetEnv","version":"0.1.0"}`

## Current Eval Numbers

From `eval_results.json` with `--n_episodes 50 --seed 42`:

| Agent | Mean Accuracy | Mean Reward | Budget Used |
|---|---:|---:|---:|
| `uniform` | 0.780 | 7.620 | 100.0% |
| `greedy_max` | 0.840 | 4.163 | 100.0% |
| `oracle` | 0.728 | 6.933 | 98.3% |
| `bandit` | 0.744 | 6.526 | 98.8% |

## Important Files

- `reasonbudget_gym/env/episode_sampler.py`
- `reasonbudget_gym/env/config.py`
- `reasonbudget_gym/solver/cached_solver.py`
- `reasonbudget_gym/eval/evaluate.py`
- `reasonbudget_gym/server/app.py`
- `Dockerfile`
- `README.md`
- `.github/workflows/ci.yml`
- `eval_results.json`
- `docs/agent_comparison.png`
- `docs/budget_pacing.png`

## Git History Added On This Branch

- `29b6ad0` Add gitignore for local dev artifacts
- `ecd0ab1` Use bundled questions for cached offline runs
- `9e122a2` Cache MiniLM question embeddings
- `c4d6234` Add GitHub Actions test workflow
- `fc6c606` Add baseline eval results and README plots
- `280a6de` Slim Docker image for HF deployment
- `fc4c73c` Add living Codex context file
- `efdc42b` Track Space binaries with Xet

## Notes For Next Codex

- Keep `HANDOFF.md` deleted; update this file instead.
- Do not remove `reasonbudget_gym/data/response_cache.json` or `reasonbudget_gym/data/embeddings.npy`; they are part of the current offline/demo story.
- The Docker image should stay lean; avoid reintroducing `sentence-transformers`, `datasets`, or training dependencies into the serving image unless truly needed.
- If enabling the live solver later, configure secrets in Hugging Face Space settings rather than hard-coding them.
- The local repo may also have an `hf` remote pointing at the Space repo; if so, pushes there will trigger Space rebuilds.