Spaces:
Running
Running
| # MolForge Real-World Workflow Mapping | |
| MolForge should feel like a compressed medicinal-chemistry lead-optimization | |
| program, not a one-shot molecule generator. | |
| The real-world pattern is: | |
| 1. A team starts with a scaffold. | |
| 2. Chemists propose edits based on structure-activity reasoning. | |
| 3. Assay teams spend limited budget to measure uncertain properties. | |
| 4. Safety and process specialists veto risky or impractical candidates. | |
| 5. The team decides whether to keep optimizing, restart, or nominate a lead. | |
| 6. Success depends on evidence, not only on the final molecule. | |
| This is exactly the shape MolForge should copy. | |
| ## Real-World Loop | |
| ### 1. Design Hypothesis | |
| Real teams do not mutate molecules randomly. A medicinal chemist proposes a | |
| change with an intended purpose: | |
| - improve potency; | |
| - reduce toxicity; | |
| - improve solubility or ADME; | |
| - simplify synthesis; | |
| - escape a known scaffold liability. | |
| MolForge equivalent: | |
| - `edit` | |
| - `rationale` | |
| - `expected_effects` | |
| - `evidence` | |
| The model should not only choose a fragment. It should say what scientific | |
| pressure that edit is meant to address. | |
| ### 2. Cheap Triage Before Expensive Assays | |
| Real projects usually run cheap computational or low-cost screens before | |
| expensive experiments. | |
| MolForge equivalent: | |
| - `evaluate_properties` | |
| - `search_literature` | |
| - `estimate_synthesizability` | |
| - `dock_target` | |
| These should be useful but imperfect. They help the model decide where to spend | |
| more serious assay budget. | |
| ### 3. Expensive Evidence Gates | |
| Real lead candidates require stronger evidence before nomination: | |
| - potency evidence; | |
| - toxicity/safety evidence; | |
| - synthesis or route feasibility evidence; | |
| - sometimes post-mutation or resistance-panel evidence. | |
| MolForge equivalent: | |
| - `assay_toxicity` | |
| - `dock_target` | |
| - `estimate_synthesizability` | |
| - hard evidence requirements in `submit` | |
| - `evidence_score` | |
| This is why `submission_score` should remain strict. A molecule that looks good | |
| but was never properly assayed is not a real lead candidate. | |
| ### 4. Cross-Functional Decision Board | |
| Real projects are not controlled by one chemist. A lead-optimization meeting | |
| usually includes: | |
| - medicinal chemistry; | |
| - assay biology; | |
| - toxicology/safety; | |
| - process chemistry or manufacturability; | |
| - project leadership. | |
| MolForge equivalent: | |
| - `lead_chemist` | |
| - `assay_planner` | |
| - `toxicologist` | |
| - `process_chemist` | |
| - governance messages; | |
| - hard vetoes; | |
| - `coordination_score` | |
| This is one of MolForge's strongest environment-innovation points. The agent is | |
| not just optimizing a molecule; it is coordinating a scientific team. | |
| ### 5. Stop, Submit, or Restart | |
| Real teams must decide when to stop spending money. Sometimes the right answer | |
| is to abandon a scaffold early because the series is a trap. | |
| MolForge equivalent: | |
| - `submit` | |
| - `restart` | |
| - budget limits; | |
| - max decision horizon; | |
| - hard scenario target shift; | |
| - sunk-cost trap in `level_2_hard` | |
| This lets the environment test project judgment, not just local molecule edits. | |
| ## How To Use This In MolForge | |
| ### Keep Two Scores | |
| Use two kinds of reward: | |
| 1. **Training reward** | |
| Helps the model learn the workflow. | |
| 2. **Formal submission score** | |
| Measures whether the agent actually nominated a valid candidate. | |
| That means: | |
| - `MOLFORGE_REWARD_MODE=curriculum` for early RL; | |
| - default `assay_gated` mode for final reporting; | |
| - `submission_score` stays `0.0` without a formal submit. | |
| This mirrors the real world: a project can make progress without nominating a | |
| lead, but it cannot claim lead success without a nomination package. | |
| ### Make Rewards Stage-Gated | |
| A good real-world reward should not be one giant final number only. | |
| Useful reward components: | |
| - valid action/schema; | |
| - useful design edit; | |
| - useful first assay; | |
| - evidence coverage; | |
| - safety improvement; | |
| - synthesis improvement; | |
| - avoiding repeated assays; | |
| - avoiding vetoed decisions; | |
| - submitting only with enough support; | |
| - restarting from a bad scaffold when appropriate. | |
| This gives RL a learnable path while preserving strict final success. | |
| ### Make The Demo Story Simple | |
| Judges should understand this in one sentence: | |
| > MolForge tests whether an LLM can run a miniature drug-discovery project: | |
| > design molecules, buy assays, respect safety vetoes, manage budget, and | |
| > nominate a candidate only when the evidence package is strong enough. | |
| Then show: | |
| - baseline model repeats invalid or vetoed actions; | |
| - SFT model learns the action language; | |
| - RL model learns better evidence and submit timing; | |
| - final candidate report card shows potency, toxicity, synthesis, evidence, | |
| budget, and coordination. | |
| ## What We Already Have | |
| MolForge already contains most of this real-world structure: | |
| - molecule slot edits; | |
| - RDKit/TDC-backed surrogate oracle path; | |
| - limited assay budget; | |
| - cheap and expensive tools; | |
| - hidden true properties; | |
| - visible assay estimates; | |
| - toxicity and synthesis constraints; | |
| - multi-agent specialist governance; | |
| - safety vetoes; | |
| - restart action; | |
| - hard target-shift scenario; | |
| - decomposed report card; | |
| - strict terminal `submission_score`; | |
| - curriculum reward mode for early RL. | |
| ## What To Strengthen Next | |
| The next useful additions should make the environment feel even more like a | |
| real project: | |
| 1. **Assay uncertainty** | |
| Repeated assays should narrow confidence intervals, but cost budget. | |
| 2. **Stage labels** | |
| Mark states as `design`, `triage`, `evidence_package`, `nomination`, or | |
| `no-go`. | |
| 3. **No-go decisions** | |
| Reward a model for stopping or restarting when the evidence says the series | |
| is unsafe or infeasible. | |
| 4. **Portfolio-style report** | |
| At terminal time, show why the candidate was nominated or rejected. | |
| 5. **Holdout variants** | |
| Randomize scaffold starts and budgets so the model cannot memorize only | |
| three paths. | |
| For the hackathon, the best near-term path is: | |
| ```text | |
| SFT v4 for action/workflow competence | |
| -> curriculum RL for observable reward improvement | |
| -> strict assay_gated evaluation for final submission_score | |
| -> README/demo framed as a real drug-discovery decision board | |
| ``` | |