Spaces:
Running
Hugging Face RL Jobs Notes
This file tracks the remote RL training attempts for the MolForge OpenEnv GRPO run.
Jobs Tried
| Job | Hardware | Result | Notes |
|---|---|---|---|
69ed7260d70108f37acdf4b8 |
a100-large |
Canceled | Stayed in SCHEDULING, so we canceled it before it used GPU time. |
69ed73d3d70108f37acdf4e1 |
l40sx1 |
Failed | Started but exited during Python import before model load or training. |
69ed74f6d70108f37acdf504 |
l40sx1 |
Failed | --with mergekit caused unsolvable pydantic conflict with openenv-core. |
69ed7be5d2c8bd8662bcef00 |
l40sx1 |
Canceled | Incorrect CLI usage (missing image name). |
69ed9440d70108f37acdf83b |
l40sx1 |
Failed | uv run couldn't find the script path issue/script.py. |
69ed94add2c8bd8662bcf215 |
l40sx1 |
Submitted | Fixed script path to just filename and used explicit python call. |
Failure History
Job 2 (69ed73d3) — ModuleNotFoundError: No module named 'mergekit'
TRL internally imports mergekit for GRPO model-merging callbacks even though we don't use merging. The fix was to add --with mergekit.
Job 3 (69ed74f6) — pydantic version conflict (CURRENT)
Adding --with mergekit broke the resolver:
mergekit(all versions) requirespydantic < 2.11openenv-core==0.2.3→fastmcp>=3.0.0→pydantic >= 2.11.7
No version of pydantic satisfies both. uv correctly refuses to resolve.
Fix
Do NOT pass --with mergekit in the HF Jobs command. Instead, the script now installs mergekit at runtime with --no-deps before importing TRL:
try:
import mergekit
except ImportError:
subprocess.check_call([sys.executable, "-m", "pip", "install", "mergekit", "--no-deps", "-q"])
This makes mergekit importable (satisfying TRL) without pulling in its conflicting pydantic constraint.
Checkpoint and Artifact Persistence
The OpenEnv GRPO script saves the final trained adapter and tokenizer to:
<run_dir>/adapters/
It also writes logs, metrics, plots, before/after evaluator JSON, and a zip archive under the run directory. When HF_OUTPUT_REPO=Adhitya122/molforge-rl-runs is set, the full run folder is uploaded to:
hf://datasets/Adhitya122/molforge-rl-runs/<run_name>
Safer Next Runs
Recommended next HF Jobs command (NO --with mergekit):
--env RL_MAX_STEPS=20
--env RL_DATASET_SIZE=30
--env MAX_COMPLETION_LENGTH=1024
Use this as a smoke run first. Once it reaches at least one trainer log line and uploads artifacts, scale back to:
--env RL_MAX_STEPS=80
--env RL_DATASET_SIZE=120
--env MAX_COMPLETION_LENGTH=2048
Good hardware choices:
| Hardware | Use |
|---|---|
l40sx1 |
Best next smoke test: 48 GB VRAM, cheaper than A100. |
a100-large |
Good full run if scheduling is available. |
h200 |
Highest headroom, more expensive, useful if A100 scheduling stalls. |
a10g-large |
Cheap fallback, but may need shorter completion length and fewer steps. |
Monitoring Commands
hf jobs inspect <job_id>
hf jobs logs <job_id> --tail 200
Use logs without inspect when searching for the real traceback, because inspect prints the full base64-encoded submitted script and makes the useful error harder to see.