Spaces:

Adhitya122
/

molforge

Running

App Files Files Community

molforge / REAL_WORLD_WORKFLOW_MAPPING.md

Adhitya122

Prepare MolForge OpenEnv Docker Space submission

bf9e424 verified 15 days ago

preview code

raw

history blame contribute delete

6.04 kB

	# MolForge Real-World Workflow Mapping

	MolForge should feel like a compressed medicinal-chemistry lead-optimization
	program, not a one-shot molecule generator.

	The real-world pattern is:

	1. A team starts with a scaffold.
	2. Chemists propose edits based on structure-activity reasoning.
	3. Assay teams spend limited budget to measure uncertain properties.
	4. Safety and process specialists veto risky or impractical candidates.
	5. The team decides whether to keep optimizing, restart, or nominate a lead.
	6. Success depends on evidence, not only on the final molecule.

	This is exactly the shape MolForge should copy.

	## Real-World Loop

	### 1. Design Hypothesis

	Real teams do not mutate molecules randomly. A medicinal chemist proposes a
	change with an intended purpose:

	- improve potency;
	- reduce toxicity;
	- improve solubility or ADME;
	- simplify synthesis;
	- escape a known scaffold liability.

	MolForge equivalent:

	- `edit`
	- `rationale`
	- `expected_effects`
	- `evidence`

	The model should not only choose a fragment. It should say what scientific
	pressure that edit is meant to address.

	### 2. Cheap Triage Before Expensive Assays

	Real projects usually run cheap computational or low-cost screens before
	expensive experiments.

	MolForge equivalent:

	- `evaluate_properties`
	- `search_literature`
	- `estimate_synthesizability`
	- `dock_target`

	These should be useful but imperfect. They help the model decide where to spend
	more serious assay budget.

	### 3. Expensive Evidence Gates

	Real lead candidates require stronger evidence before nomination:

	- potency evidence;
	- toxicity/safety evidence;
	- synthesis or route feasibility evidence;
	- sometimes post-mutation or resistance-panel evidence.

	MolForge equivalent:

	- `assay_toxicity`
	- `dock_target`
	- `estimate_synthesizability`
	- hard evidence requirements in `submit`
	- `evidence_score`

	This is why `submission_score` should remain strict. A molecule that looks good
	but was never properly assayed is not a real lead candidate.

	### 4. Cross-Functional Decision Board

	Real projects are not controlled by one chemist. A lead-optimization meeting
	usually includes:

	- medicinal chemistry;
	- assay biology;
	- toxicology/safety;
	- process chemistry or manufacturability;
	- project leadership.

	MolForge equivalent:

	- `lead_chemist`
	- `assay_planner`
	- `toxicologist`
	- `process_chemist`
	- governance messages;
	- hard vetoes;
	- `coordination_score`

	This is one of MolForge's strongest environment-innovation points. The agent is
	not just optimizing a molecule; it is coordinating a scientific team.

	### 5. Stop, Submit, or Restart

	Real teams must decide when to stop spending money. Sometimes the right answer
	is to abandon a scaffold early because the series is a trap.

	MolForge equivalent:

	- `submit`
	- `restart`
	- budget limits;
	- max decision horizon;
	- hard scenario target shift;
	- sunk-cost trap in `level_2_hard`

	This lets the environment test project judgment, not just local molecule edits.

	## How To Use This In MolForge

	### Keep Two Scores

	Use two kinds of reward:

	1. Training reward
	Helps the model learn the workflow.

	2. Formal submission score
	Measures whether the agent actually nominated a valid candidate.

	That means:

	- `MOLFORGE_REWARD_MODE=curriculum` for early RL;
	- default `assay_gated` mode for final reporting;
	- `submission_score` stays `0.0` without a formal submit.

	This mirrors the real world: a project can make progress without nominating a
	lead, but it cannot claim lead success without a nomination package.

	### Make Rewards Stage-Gated

	A good real-world reward should not be one giant final number only.

	Useful reward components:

	- valid action/schema;
	- useful design edit;
	- useful first assay;
	- evidence coverage;
	- safety improvement;
	- synthesis improvement;
	- avoiding repeated assays;
	- avoiding vetoed decisions;
	- submitting only with enough support;
	- restarting from a bad scaffold when appropriate.

	This gives RL a learnable path while preserving strict final success.

	### Make The Demo Story Simple

	Judges should understand this in one sentence:

	> MolForge tests whether an LLM can run a miniature drug-discovery project:
	> design molecules, buy assays, respect safety vetoes, manage budget, and
	> nominate a candidate only when the evidence package is strong enough.

	Then show:

	- baseline model repeats invalid or vetoed actions;
	- SFT model learns the action language;
	- RL model learns better evidence and submit timing;
	- final candidate report card shows potency, toxicity, synthesis, evidence,
	budget, and coordination.

	## What We Already Have

	MolForge already contains most of this real-world structure:

	- molecule slot edits;
	- RDKit/TDC-backed surrogate oracle path;
	- limited assay budget;
	- cheap and expensive tools;
	- hidden true properties;
	- visible assay estimates;
	- toxicity and synthesis constraints;
	- multi-agent specialist governance;
	- safety vetoes;
	- restart action;
	- hard target-shift scenario;
	- decomposed report card;
	- strict terminal `submission_score`;
	- curriculum reward mode for early RL.

	## What To Strengthen Next

	The next useful additions should make the environment feel even more like a
	real project:

	1. Assay uncertainty
	Repeated assays should narrow confidence intervals, but cost budget.

	2. Stage labels
	Mark states as `design`, `triage`, `evidence_package`, `nomination`, or
	`no-go`.

	3. No-go decisions
	Reward a model for stopping or restarting when the evidence says the series
	is unsafe or infeasible.

	4. Portfolio-style report
	At terminal time, show why the candidate was nominated or rejected.

	5. Holdout variants
	Randomize scaffold starts and budgets so the model cannot memorize only
	three paths.

	For the hackathon, the best near-term path is:

	```text
	SFT v4 for action/workflow competence
	-> curriculum RL for observable reward improvement
	-> strict assay_gated evaluation for final submission_score
	-> README/demo framed as a real drug-discovery decision board
	```