# Hugging Face RL Jobs Notes This file tracks the remote RL training attempts for the MolForge OpenEnv GRPO run. ## Jobs Tried | Job | Hardware | Result | Notes | | --- | --- | --- | --- | | `69ed7260d70108f37acdf4b8` | `a100-large` | Canceled | Stayed in `SCHEDULING`, so we canceled it before it used GPU time. | | `69ed73d3d70108f37acdf4e1` | `l40sx1` | Failed | Started but exited during Python import before model load or training. | | `69ed74f6d70108f37acdf504` | `l40sx1` | **Failed** | `--with mergekit` caused unsolvable pydantic conflict with `openenv-core`. | | `69ed7be5d2c8bd8662bcef00` | `l40sx1` | Canceled | Incorrect CLI usage (missing image name). | | `69ed9440d70108f37acdf83b` | `l40sx1` | Failed | `uv run` couldn't find the script path `issue/script.py`. | | `69ed94add2c8bd8662bcf215` | `l40sx1` | Submitted | Fixed script path to just filename and used explicit `python` call. | ## Failure History ### Job 2 (`69ed73d3`) — `ModuleNotFoundError: No module named 'mergekit'` TRL internally imports `mergekit` for GRPO model-merging callbacks even though we don't use merging. The fix was to add `--with mergekit`. ### Job 3 (`69ed74f6`) — **pydantic version conflict** (CURRENT) Adding `--with mergekit` broke the resolver: - `mergekit` (all versions) requires `pydantic < 2.11` - `openenv-core==0.2.3` → `fastmcp>=3.0.0` → `pydantic >= 2.11.7` **No version of pydantic satisfies both.** uv correctly refuses to resolve. ## Fix **Do NOT pass `--with mergekit`** in the HF Jobs command. Instead, the script now installs mergekit at runtime with `--no-deps` before importing TRL: ```python try: import mergekit except ImportError: subprocess.check_call([sys.executable, "-m", "pip", "install", "mergekit", "--no-deps", "-q"]) ``` This makes `mergekit` importable (satisfying TRL) without pulling in its conflicting pydantic constraint. ## Checkpoint and Artifact Persistence The OpenEnv GRPO script saves the final trained adapter and tokenizer to: ```text /adapters/ ``` It also writes logs, metrics, plots, before/after evaluator JSON, and a zip archive under the run directory. When `HF_OUTPUT_REPO=Adhitya122/molforge-rl-runs` is set, the full run folder is uploaded to: ```text hf://datasets/Adhitya122/molforge-rl-runs/ ``` ## Safer Next Runs Recommended next HF Jobs command (NO `--with mergekit`): ```bash --env RL_MAX_STEPS=20 --env RL_DATASET_SIZE=30 --env MAX_COMPLETION_LENGTH=1024 ``` Use this as a smoke run first. Once it reaches at least one trainer log line and uploads artifacts, scale back to: ```bash --env RL_MAX_STEPS=80 --env RL_DATASET_SIZE=120 --env MAX_COMPLETION_LENGTH=2048 ``` Good hardware choices: | Hardware | Use | | --- | --- | | `l40sx1` | Best next smoke test: 48 GB VRAM, cheaper than A100. | | `a100-large` | Good full run if scheduling is available. | | `h200` | Highest headroom, more expensive, useful if A100 scheduling stalls. | | `a10g-large` | Cheap fallback, but may need shorter completion length and fewer steps. | ## Monitoring Commands ```bash hf jobs inspect hf jobs logs --tail 200 ``` Use logs without `inspect` when searching for the real traceback, because `inspect` prints the full base64-encoded submitted script and makes the useful error harder to see.