Spaces:
Running
Running
| # Hugging Face RL Jobs Notes | |
| This file tracks the remote RL training attempts for the MolForge OpenEnv GRPO run. | |
| ## Jobs Tried | |
| | Job | Hardware | Result | Notes | | |
| | --- | --- | --- | --- | | |
| | `69ed7260d70108f37acdf4b8` | `a100-large` | Canceled | Stayed in `SCHEDULING`, so we canceled it before it used GPU time. | | |
| | `69ed73d3d70108f37acdf4e1` | `l40sx1` | Failed | Started but exited during Python import before model load or training. | | |
| | `69ed74f6d70108f37acdf504` | `l40sx1` | **Failed** | `--with mergekit` caused unsolvable pydantic conflict with `openenv-core`. | | |
| | `69ed7be5d2c8bd8662bcef00` | `l40sx1` | Canceled | Incorrect CLI usage (missing image name). | | |
| | `69ed9440d70108f37acdf83b` | `l40sx1` | Failed | `uv run` couldn't find the script path `issue/script.py`. | | |
| | `69ed94add2c8bd8662bcf215` | `l40sx1` | Submitted | Fixed script path to just filename and used explicit `python` call. | | |
| ## Failure History | |
| ### Job 2 (`69ed73d3`) — `ModuleNotFoundError: No module named 'mergekit'` | |
| TRL internally imports `mergekit` for GRPO model-merging callbacks even though we don't use merging. The fix was to add `--with mergekit`. | |
| ### Job 3 (`69ed74f6`) — **pydantic version conflict** (CURRENT) | |
| Adding `--with mergekit` broke the resolver: | |
| - `mergekit` (all versions) requires `pydantic < 2.11` | |
| - `openenv-core==0.2.3` → `fastmcp>=3.0.0` → `pydantic >= 2.11.7` | |
| **No version of pydantic satisfies both.** uv correctly refuses to resolve. | |
| ## Fix | |
| **Do NOT pass `--with mergekit`** in the HF Jobs command. Instead, the script now installs mergekit at runtime with `--no-deps` before importing TRL: | |
| ```python | |
| try: | |
| import mergekit | |
| except ImportError: | |
| subprocess.check_call([sys.executable, "-m", "pip", "install", "mergekit", "--no-deps", "-q"]) | |
| ``` | |
| This makes `mergekit` importable (satisfying TRL) without pulling in its conflicting pydantic constraint. | |
| ## Checkpoint and Artifact Persistence | |
| The OpenEnv GRPO script saves the final trained adapter and tokenizer to: | |
| ```text | |
| <run_dir>/adapters/ | |
| ``` | |
| It also writes logs, metrics, plots, before/after evaluator JSON, and a zip archive under the run directory. When `HF_OUTPUT_REPO=Adhitya122/molforge-rl-runs` is set, the full run folder is uploaded to: | |
| ```text | |
| hf://datasets/Adhitya122/molforge-rl-runs/<run_name> | |
| ``` | |
| ## Safer Next Runs | |
| Recommended next HF Jobs command (NO `--with mergekit`): | |
| ```bash | |
| --env RL_MAX_STEPS=20 | |
| --env RL_DATASET_SIZE=30 | |
| --env MAX_COMPLETION_LENGTH=1024 | |
| ``` | |
| Use this as a smoke run first. Once it reaches at least one trainer log line and uploads artifacts, scale back to: | |
| ```bash | |
| --env RL_MAX_STEPS=80 | |
| --env RL_DATASET_SIZE=120 | |
| --env MAX_COMPLETION_LENGTH=2048 | |
| ``` | |
| Good hardware choices: | |
| | Hardware | Use | | |
| | --- | --- | | |
| | `l40sx1` | Best next smoke test: 48 GB VRAM, cheaper than A100. | | |
| | `a100-large` | Good full run if scheduling is available. | | |
| | `h200` | Highest headroom, more expensive, useful if A100 scheduling stalls. | | |
| | `a10g-large` | Cheap fallback, but may need shorter completion length and fewer steps. | | |
| ## Monitoring Commands | |
| ```bash | |
| hf jobs inspect <job_id> | |
| hf jobs logs <job_id> --tail 200 | |
| ``` | |
| Use logs without `inspect` when searching for the real traceback, because `inspect` prints the full base64-encoded submitted script and makes the useful error harder to see. | |