Adhitya122 commited on
Commit
8bf28d8
·
verified ·
1 Parent(s): f3e2722

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +4 -0
  2. index.html +211 -113
README.md CHANGED
@@ -23,8 +23,12 @@ MolForge is a reinforcement learning environment that simulates a **medical onco
23
  **[View the MolForge Space Deployment on Hugging Face](https://huggingface.co/spaces/Adhitya122/molforge)**
24
  **[Try the RL Training Notebook on Google Colab](https://colab.research.google.com/drive/1c6npGkGNbbbd8XFNeS6zInBpopLnJ4W4?usp=sharing)**
25
 
 
 
 
26
  ### The Scientific Method as a Workflow
27
 
 
28
  Imagine a biotech team tasked with optimizing a lead candidate for **KRAS G12C** (including a high-difficulty resistance panel). The model doesn't just "write" a molecule; it controls a specialist team that must navigate a resource-constrained laboratory:
29
 
30
  - **Lead Chemist**: Proposes molecular edits and decides when to submit.
 
23
  **[View the MolForge Space Deployment on Hugging Face](https://huggingface.co/spaces/Adhitya122/molforge)**
24
  **[Try the RL Training Notebook on Google Colab](https://colab.research.google.com/drive/1c6npGkGNbbbd8XFNeS6zInBpopLnJ4W4?usp=sharing)**
25
 
26
+ ### 🎥 Explainer Video
27
+ [![MolForge Explainer Video](https://img.youtube.com/vi/q8YoA0YhIn8/0.jpg)](https://youtu.be/q8YoA0YhIn8)
28
+
29
  ### The Scientific Method as a Workflow
30
 
31
+
32
  Imagine a biotech team tasked with optimizing a lead candidate for **KRAS G12C** (including a high-difficulty resistance panel). The model doesn't just "write" a molecule; it controls a specialist team that must navigate a resource-constrained laboratory:
33
 
34
  - **Lead Chemist**: Proposes molecular edits and decides when to submit.
index.html CHANGED
@@ -14,13 +14,13 @@
14
  color: #0f172a;
15
  }
16
  .prose-custom {
17
- max-width: 65ch;
18
  margin: 0 auto;
19
  }
20
  .shadcn-card {
21
  border: 1px solid #e2e8f0;
22
  background: #ffffff;
23
- border-radius: 0.5rem;
24
  box-shadow: 0 1px 3px 0 rgb(0 0 0 / 0.1), 0 1px 2px -1px rgb(0 0 0 / 0.1);
25
  }
26
  .shadcn-badge {
@@ -36,195 +36,293 @@
36
  .mono {
37
  font-family: 'JetBrains Mono', monospace;
38
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  </style>
40
  </head>
41
  <body class="bg-white">
42
 
43
  <!-- Navigation -->
44
  <nav class="border-b sticky top-0 bg-white/80 backdrop-blur-md z-50">
45
- <div class="max-w-4xl mx-auto px-6 h-16 flex items-center justify-between">
46
  <span class="font-bold tracking-tight text-lg">MolForge</span>
47
  <div class="flex gap-6 text-sm font-medium text-slate-600">
48
  <a href="https://github.com/Adhitya-Vardhan/molt_lab" class="hover:text-black transition-colors">GitHub</a>
49
- <a href="https://colab.research.google.com/drive/1c6npGkGNbbbd8XFNeS6zInBpopLnJ4W4?usp=sharing" class="hover:text-black transition-colors">Training</a>
 
50
  </div>
51
  </div>
52
  </nav>
53
 
54
- <main class="max-w-4xl mx-auto px-6 py-20">
55
  <!-- Header -->
56
- <div class="mb-16">
57
- <div class="flex gap-2 mb-6">
58
- <span class="shadcn-badge bg-indigo-50 text-indigo-700">OpenEnv Hackathon</span>
59
- <span class="shadcn-badge">Deep Research</span>
60
  </div>
61
- <h1 class="text-4xl md:text-5xl font-extrabold tracking-tight mb-6 leading-tight">
62
- MolForge: Verifier-Driven RL for Drug Discovery
63
  </h1>
64
- <p class="text-xl text-slate-500 mb-8 leading-relaxed">
65
- Transforming the scientific method into a reinforcement learning workflow where the LLM is the scientist, not the judge.
66
  </p>
67
- <div class="flex items-center gap-4 text-sm text-slate-400">
68
- <div class="w-8 h-8 rounded-full bg-slate-200"></div>
69
- <div>
70
- <p class="font-semibold text-slate-900">Adhitya Vardhan</p>
71
- <p>April 26, 2026 8 min read</p>
 
 
 
 
 
 
 
 
72
  </div>
 
73
  </div>
74
  </div>
75
 
76
  <!-- Introduction -->
77
- <div class="prose prose-slate prose-lg max-w-none mb-20 leading-relaxed text-slate-700">
 
78
  <p>
79
- In traditional drug discovery tasks, LLMs are often asked to "generate a molecule" in a single shot. But science doesn't happen in a vacuum. It happens in the loop—through trial, error, and verification.
80
  </p>
81
  <p class="mt-4">
82
- <strong>MolForge</strong> is a reinforcement learning environment that simulates a medical oncology discovery lab. It forces the model to navigate real-world constraints: limited budget, molecular toxicity, and synthesis complexity.
83
  </p>
84
 
85
- <div class="my-12 p-6 bg-slate-50 border rounded-xl">
86
- <p class="text-indigo-700 font-semibold mb-2">The Core Thesis</p>
87
- <p class="italic text-slate-600">
88
- "The LLM is not the judge. The LLM is the scientist being judged by external, verifiable reality."
89
  </p>
90
  </div>
91
  </div>
92
 
93
- <!-- Architecture Section -->
94
- <section class="mb-24">
95
- <h2 class="text-2xl font-bold mb-8 flex items-center gap-2">
96
- <span class="w-2 h-8 bg-indigo-500 rounded-full"></span>
97
- Scientific Architecture
98
- </h2>
99
- <p class="text-slate-600 mb-8 text-lg">
100
- The environment is designed as a **POMDP (Partially Observable Markov Decision Process)**. This separation between what is true and what is visible is what makes the environment a scientific challenge.
101
- </p>
102
 
103
- <div class="grid md:grid-cols-2 gap-6 mb-12">
104
- <div class="shadcn-card p-6 border-l-4 border-l-slate-900">
105
- <h3 class="font-bold mb-3 text-sm uppercase tracking-wider text-slate-500">Hidden State</h3>
106
- <p class="text-slate-600 text-sm">Ground-truth scores for potency, safety, and synthesizability. Includes sunk-cost traps and late-stage mutation shifts.</p>
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  </div>
108
- <div class="shadcn-card p-6 border-l-4 border-l-indigo-500">
109
- <h3 class="font-bold mb-3 text-sm uppercase tracking-wider text-slate-500">Visible State</h3>
110
- <p class="text-slate-600 text-sm">Noisy assay reports from RDKit and TDC, remaining budget, and structured governance feedback.</p>
 
 
 
 
111
  </div>
112
  </div>
 
113
 
114
- <div class="shadcn-card p-4 bg-slate-50">
115
- <img src="assets/molforge_architecture.png" alt="Architecture" class="rounded-lg w-full">
116
- <p class="mt-4 text-center text-xs text-slate-400 font-medium italic">MolForge Closed-Loop Feedback Flow</p>
 
 
117
  </div>
118
- </section>
119
 
120
- <!-- Seven Pillars -->
121
- <section class="mb-24">
122
- <h2 class="text-2xl font-bold mb-10">The Seven Pillars of MolForge</h2>
123
- <div class="grid gap-4">
124
- <div class="shadcn-card p-6 hover:bg-slate-50 transition-colors cursor-default">
125
- <h4 class="font-bold text-indigo-600 mb-1">1. Verifier-Based Evaluation</h4>
126
- <p class="text-slate-600 text-sm">The LLM is held accountable by real-world verifiers like **RDKit** and **TDC** instead of relying on self-judgment.</p>
 
127
  </div>
128
- <div class="shadcn-card p-6 hover:bg-slate-50 transition-colors cursor-default">
129
- <h4 class="font-bold text-indigo-600 mb-1">2. Physics-Grounded Simulation</h4>
130
- <p class="text-slate-600 text-sm">Heuristic docking simulates receptor fit, lipophilicity (LogP), and polarity (TPSA) in milliseconds.</p>
131
  </div>
132
- <div class="shadcn-card p-6 hover:bg-slate-50 transition-colors cursor-default">
133
- <h4 class="font-bold text-indigo-600 mb-1">3. Self-Correction Loop</h4>
134
- <p class="text-slate-600 text-sm">Agents receive structured feedback on every molecular edit, allowing them to repair liabilities in real-time.</p>
135
  </div>
136
- <div class="shadcn-card p-6 hover:bg-slate-50 transition-colors cursor-default">
137
- <h4 class="font-bold text-indigo-600 mb-1">4. Decomposed Reward Architecture</h4>
138
- <p class="text-slate-600 text-sm">Rewards are broken down by research, coordination, and quality for maximum observability.</p>
139
  </div>
140
- <div class="shadcn-card p-6 hover:bg-slate-50 transition-colors cursor-default">
141
- <h4 class="font-bold text-indigo-600 mb-1">5. Strategic Training Modes</h4>
142
- <p class="text-slate-600 text-sm">Curriculum mode provides partial credit "breadcrumbs" to solve the sparse reward problem in RL.</p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
143
  </div>
144
- <div class="shadcn-card p-6 hover:bg-slate-50 transition-colors cursor-default">
145
- <h4 class="font-bold text-indigo-600 mb-1">6. Multi-Agent Governance</h4>
146
- <p class="text-slate-600 text-sm">A specialized team (Chemist, Toxicologist, Planner) must coordinate and approve every major move.</p>
147
  </div>
148
- <div class="shadcn-card p-6 hover:bg-slate-50 transition-colors cursor-default">
149
- <h4 class="font-bold text-indigo-600 mb-1">7. Scientific Model Improvement</h4>
150
- <p class="text-slate-600 text-sm">Verifier feedback drives the model toward scientifically sound designs rather than pattern matching.</p>
151
  </div>
152
  </div>
153
- </section>
154
 
155
- <!-- Training & Results -->
156
- <section class="mb-24">
157
- <h2 class="text-2xl font-bold mb-8">Training & Performance</h2>
158
- <div class="prose prose-slate text-slate-600 mb-12 leading-relaxed">
159
- <p>
160
- We trained a **Qwen3.5-2B** model using GRPO against the MolForge environment. By transitioning from a simple SFT baseline to a verifier-driven RL policy, we saw significant improvements across all difficulty levels.
161
  </p>
162
  </div>
 
 
 
 
 
 
 
 
163
 
164
  <div class="shadcn-card overflow-hidden mb-12">
165
  <table class="w-full text-left text-sm">
166
- <thead class="bg-slate-50 border-b">
167
  <tr>
168
- <th class="px-6 py-4 font-semibold text-slate-900">Difficulty</th>
169
- <th class="px-6 py-4 font-semibold text-slate-900">Before (SFT)</th>
170
- <th class="px-6 py-4 font-semibold text-slate-900">After (RL)</th>
171
- <th class="px-6 py-4 font-semibold text-slate-900">Improvement</th>
172
  </tr>
173
  </thead>
174
  <tbody class="divide-y">
175
  <tr>
176
- <td class="px-6 py-4 font-medium">Easy</td>
177
- <td class="px-6 py-4 text-slate-500">0.1167</td>
178
- <td class="px-6 py-4 font-semibold">0.1295</td>
179
- <td class="px-6 py-4 text-emerald-600 font-bold">+10.9%</td>
180
  </tr>
181
  <tr>
182
- <td class="px-6 py-4 font-medium">Medium</td>
183
- <td class="px-6 py-4 text-slate-500">0.1167</td>
184
- <td class="px-6 py-4 font-semibold">0.1278</td>
185
- <td class="px-6 py-4 text-emerald-600 font-bold">+9.5%</td>
186
  </tr>
187
  <tr>
188
- <td class="px-6 py-4 font-medium">Hard</td>
189
- <td class="px-6 py-4 text-slate-500">0.0800</td>
190
- <td class="px-6 py-4 font-semibold">0.0866</td>
191
- <td class="px-6 py-4 text-emerald-600 font-bold">+8.3%</td>
192
  </tr>
193
  </tbody>
194
  </table>
195
  </div>
196
 
197
- <div class="grid md:grid-cols-2 gap-8 mb-12">
198
- <div class="shadcn-card p-4">
199
- <img src="assets/reward_curve.png" alt="Reward Curve" class="rounded border">
200
- <p class="mt-3 text-center text-xs text-slate-400 font-medium">Training Reward Progression</p>
201
  </div>
202
- <div class="shadcn-card p-4">
203
- <img src="assets/Logs.png" alt="Logs" class="rounded border">
204
- <p class="mt-3 text-center text-xs text-slate-400 font-medium">Governance Telemetry</p>
205
  </div>
206
  </div>
207
  </section>
208
 
209
  <!-- Final Takeaway -->
210
- <section class="mb-24 pt-12 border-t text-center">
211
- <h2 class="text-3xl font-extrabold mb-6">Join the Scientific RL Revolution</h2>
212
- <p class="text-slate-500 max-w-xl mx-auto mb-10 text-lg">
213
- MolForge is open source and ready for further exploration. Explore the search space of 256 fragment combinations across oncology scenarios.
214
  </p>
215
  <div class="flex flex-wrap justify-center gap-4">
216
- <a href="https://github.com/Adhitya-Vardhan/molt_lab" class="px-8 py-3 bg-slate-900 text-white font-bold rounded-lg hover:bg-slate-800 transition-all">GitHub Repo</a>
217
- <a href="https://colab.research.google.com/drive/1c6npGkGNbbbd8XFNeS6zInBpopLnJ4W4?usp=sharing" class="px-8 py-3 bg-white border border-slate-200 text-slate-900 font-bold rounded-lg hover:bg-slate-50 transition-all">Colab Notebook</a>
 
 
 
 
 
218
  </div>
219
  </section>
220
 
221
  <!-- Footer -->
222
- <footer class="py-12 border-t text-sm text-slate-400 flex flex-col md:flex-row justify-between items-center gap-4">
223
- <p>© 2026 MolForge • Built for OpenEnv</p>
224
- <div class="flex gap-6">
225
- <a href="https://huggingface.co/Adhitya122/molforge-grpo-oncology" class="hover:text-slate-600">Model Card</a>
226
- <a href="https://huggingface.co/spaces/Adhitya122/molforge" class="hover:text-slate-600">Space Home</a>
227
- </div>
228
  </footer>
229
  </main>
230
 
 
14
  color: #0f172a;
15
  }
16
  .prose-custom {
17
+ max-width: 70ch;
18
  margin: 0 auto;
19
  }
20
  .shadcn-card {
21
  border: 1px solid #e2e8f0;
22
  background: #ffffff;
23
+ border-radius: 0.75rem;
24
  box-shadow: 0 1px 3px 0 rgb(0 0 0 / 0.1), 0 1px 2px -1px rgb(0 0 0 / 0.1);
25
  }
26
  .shadcn-badge {
 
36
  .mono {
37
  font-family: 'JetBrains Mono', monospace;
38
  }
39
+ .video-container {
40
+ position: relative;
41
+ padding-bottom: 56.25%;
42
+ height: 0;
43
+ overflow: hidden;
44
+ border-radius: 0.75rem;
45
+ border: 1px solid #e2e8f0;
46
+ }
47
+ .video-container iframe {
48
+ position: absolute;
49
+ top: 0;
50
+ left: 0;
51
+ width: 100%;
52
+ height: 100%;
53
+ }
54
  </style>
55
  </head>
56
  <body class="bg-white">
57
 
58
  <!-- Navigation -->
59
  <nav class="border-b sticky top-0 bg-white/80 backdrop-blur-md z-50">
60
+ <div class="max-w-5xl mx-auto px-6 h-16 flex items-center justify-between">
61
  <span class="font-bold tracking-tight text-lg">MolForge</span>
62
  <div class="flex gap-6 text-sm font-medium text-slate-600">
63
  <a href="https://github.com/Adhitya-Vardhan/molt_lab" class="hover:text-black transition-colors">GitHub</a>
64
+ <a href="https://huggingface.co/spaces/Adhitya122/molforge" class="hover:text-black transition-colors">Space</a>
65
+ <a href="https://colab.research.google.com/drive/1c6npGkGNbbbd8XFNeS6zInBpopLnJ4W4?usp=sharing" class="hover:text-black transition-colors font-bold text-indigo-600">Try Training</a>
66
  </div>
67
  </div>
68
  </nav>
69
 
70
+ <main class="max-w-5xl mx-auto px-6 py-20">
71
  <!-- Header -->
72
+ <div class="mb-16 text-center">
73
+ <div class="flex justify-center gap-2 mb-6">
74
+ <span class="shadcn-badge bg-indigo-50 text-indigo-700 border border-indigo-100">Hackathon Submission</span>
75
+ <span class="shadcn-badge">Medical Oncology</span>
76
  </div>
77
+ <h1 class="text-4xl md:text-6xl font-extrabold tracking-tight mb-6 leading-tight max-w-3xl mx-auto">
78
+ MolForge: The Scientific Method as a Workflow
79
  </h1>
80
+ <p class="text-xl text-slate-500 mb-10 leading-relaxed max-w-2xl mx-auto">
81
+ How we trained an LLM to navigate a resource-constrained laboratory, optimize oncology drug candidates, and survive scientific "sunk-cost" traps.
82
  </p>
83
+
84
+ <div class="flex items-center justify-center gap-4 text-sm text-slate-400 mb-12">
85
+ <div class="w-10 h-10 rounded-full bg-indigo-100 flex items-center justify-center text-indigo-600 font-bold">AV</div>
86
+ <div class="text-left">
87
+ <p class="font-semibold text-slate-900 leading-none mb-1">Adhitya Vardhan</p>
88
+ <p class="leading-none">OpenEnv Hackathon 2026 • 12 min read</p>
89
+ </div>
90
+ </div>
91
+
92
+ <!-- Video Section -->
93
+ <div class="max-w-3xl mx-auto mb-20">
94
+ <div class="video-container shadow-2xl">
95
+ <iframe src="https://www.youtube.com/embed/q8YoA0YhIn8" title="MolForge Explainer Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
96
  </div>
97
+ <p class="mt-4 text-sm text-slate-400 italic">Watch the MolForge technical explainer (3:24)</p>
98
  </div>
99
  </div>
100
 
101
  <!-- Introduction -->
102
+ <div class="prose prose-slate prose-lg max-w-none mb-24 leading-relaxed text-slate-700 border-b pb-20">
103
+ <h2 class="text-3xl font-bold text-slate-900 mb-6">Introduction</h2>
104
  <p>
105
+ The challenge of modern AI in drug discovery is often reduced to a static prediction task: "Does this molecule bind to this target?" But in a real-world biotech lab, the answer is never that simple. A lead chemist must balance <strong>potency</strong> against <strong>toxicity</strong>, <strong>synthesizability</strong>, and a rapidly depleting <strong>assay budget</strong>.
106
  </p>
107
  <p class="mt-4">
108
+ <strong>MolForge</strong> is a verifier-driven reinforcement learning environment that replicates this complexity. We don't trust the model to "guess" the answer. Instead, we force it to execute a sequence of actions—edits, assays, and specialist reviews—until it can justify a final submission with verifiable evidence.
109
  </p>
110
 
111
+ <div class="mt-12 p-8 bg-slate-900 text-slate-200 rounded-2xl shadow-lg">
112
+ <p class="text-indigo-400 font-bold mb-4 uppercase tracking-widest text-xs">Core Philosophy</p>
113
+ <p class="text-2xl font-medium leading-tight">
114
+ "The model is a trainable research agent inside a controlled scientific environment, not an oracle. It is judged by chemistry and biomedical verifiers, corrected by specialist feedback, and scored by a reward system that explains exactly where the path to a discovery failed."
115
  </p>
116
  </div>
117
  </div>
118
 
119
+ <!-- Scientific Architecture -->
120
+ <section class="mb-32">
121
+ <div class="flex items-center gap-3 mb-8">
122
+ <div class="w-10 h-10 rounded-lg bg-indigo-600 flex items-center justify-center text-white font-bold">1</div>
123
+ <h2 class="text-3xl font-bold tracking-tight">The POMDP Architecture</h2>
124
+ </div>
 
 
 
125
 
126
+ <p class="text-slate-600 mb-10 text-lg leading-relaxed">
127
+ MolForge is built as a <strong>Partially Observable Markov Decision Process (POMDP)</strong>. This means the agent never sees the "hidden truth" of the receptor. It only sees what its budget allows it to assay.
128
+ </p>
129
+
130
+ <div class="shadcn-card p-4 bg-slate-50 mb-12">
131
+ <img src="assets/molforge_architecture.png" alt="Architecture" class="rounded-lg w-full">
132
+ <p class="mt-4 text-center text-xs text-slate-400 font-medium tracking-wide">THE SCIENTIFIC FEEDBACK LOOP: VERIFIER-FIRST DESIGN</p>
133
+ </div>
134
+
135
+ <div class="grid md:grid-cols-2 gap-8">
136
+ <div class="space-y-4">
137
+ <h4 class="font-bold text-slate-900 border-b pb-2">The Hidden State</h4>
138
+ <ul class="space-y-3 text-slate-600 text-sm">
139
+ <li class="flex gap-2"><span>•</span> <strong>Ground Truth Potency:</strong> The exact hidden binding energy of the KRAS G12C pocket.</li>
140
+ <li class="flex gap-2"><span>•</span> <strong>Sunk-Cost Traps:</strong> Starting scaffolds that look promising but have hidden liabilities.</li>
141
+ <li class="flex gap-2"><span>•</span> <strong>Target Mutation:</strong> Late-stage shifts in the pocket (Level 2) that punish blind optimization.</li>
142
+ </ul>
143
  </div>
144
+ <div class="space-y-4">
145
+ <h4 class="font-bold text-indigo-600 border-b pb-2">The Visible Evidence</h4>
146
+ <ul class="space-y-3 text-slate-600 text-sm">
147
+ <li class="flex gap-2"><span>•</span> <strong>RDKit & TDC Signals:</strong> Noisy, verifier-backed readings of Lipophilicity (LogP) and TPSA.</li>
148
+ <li class="flex gap-2"><span>•</span> <strong>Heuristic Docking:</strong> Fast simulations of pocket matching and receptor fit.</li>
149
+ <li class="flex gap-2"><span>•</span> <strong>Governance Vetoes:</strong> Objections from the Safety Specialist or Process Chemist.</li>
150
+ </ul>
151
  </div>
152
  </div>
153
+ </section>
154
 
155
+ <!-- Search Space & Scenarios -->
156
+ <section class="mb-32">
157
+ <div class="flex items-center gap-3 mb-8">
158
+ <div class="w-10 h-10 rounded-lg bg-indigo-600 flex items-center justify-center text-white font-bold">2</div>
159
+ <h2 class="text-3xl font-bold tracking-tight">The Molecular Search Space</h2>
160
  </div>
 
161
 
162
+ <p class="text-slate-600 mb-10 text-lg leading-relaxed">
163
+ We don't ask the model to memorize molecules. We ask it to navigate a <strong>combinatorial space of 256 fragments</strong> across three starting scenarios.
164
+ </p>
165
+
166
+ <div class="grid grid-cols-2 md:grid-cols-4 gap-4 mb-12">
167
+ <div class="shadcn-card p-4 text-center">
168
+ <p class="text-xs font-bold text-slate-400 uppercase mb-2">Warhead</p>
169
+ <p class="text-sm font-semibold">4 Options</p>
170
  </div>
171
+ <div class="shadcn-card p-4 text-center">
172
+ <p class="text-xs font-bold text-slate-400 uppercase mb-2">Hinge</p>
173
+ <p class="text-sm font-semibold">4 Options</p>
174
  </div>
175
+ <div class="shadcn-card p-4 text-center">
176
+ <p class="text-xs font-bold text-slate-400 uppercase mb-2">Solvent Tail</p>
177
+ <p class="text-sm font-semibold">4 Options</p>
178
  </div>
179
+ <div class="shadcn-card p-4 text-center">
180
+ <p class="text-xs font-bold text-slate-400 uppercase mb-2">Back Pocket</p>
181
+ <p class="text-sm font-semibold">4 Options</p>
182
  </div>
183
+ </div>
184
+
185
+ <h3 class="text-xl font-bold mb-6">Benchmark Scenarios</h3>
186
+ <div class="shadcn-card overflow-hidden mb-12">
187
+ <table class="w-full text-left text-sm">
188
+ <thead class="bg-slate-50 border-b">
189
+ <tr>
190
+ <th class="px-6 py-4 font-semibold text-slate-900">Scenario</th>
191
+ <th class="px-6 py-4 font-semibold text-slate-900">Story</th>
192
+ <th class="px-6 py-4 font-semibold text-slate-900">Budget</th>
193
+ <th class="px-6 py-4 font-semibold text-slate-900">Difficulty</th>
194
+ </tr>
195
+ </thead>
196
+ <tbody class="divide-y">
197
+ <tr>
198
+ <td class="px-6 py-4 font-bold text-indigo-600">Level 0: Easy</td>
199
+ <td class="px-6 py-4 text-slate-500">Near-viable scaffold needs safety repair and evidence.</td>
200
+ <td class="px-6 py-4">3600</td>
201
+ <td class="px-6 py-4"><span class="shadcn-badge bg-emerald-50 text-emerald-700">Low</span></td>
202
+ </tr>
203
+ <tr>
204
+ <td class="px-6 py-4 font-bold text-indigo-600">Level 1: Medium</td>
205
+ <td class="px-6 py-4 text-slate-500">Potency, toxicity, and synthesis must be balanced.</td>
206
+ <td class="px-6 py-4">4300</td>
207
+ <td class="px-6 py-4"><span class="shadcn-badge bg-orange-50 text-orange-700">Moderate</span></td>
208
+ </tr>
209
+ <tr>
210
+ <td class="px-6 py-4 font-bold text-indigo-600">Level 2: Hard</td>
211
+ <td class="px-6 py-4 text-slate-500">Sunk-cost trap: starting series has hidden liability.</td>
212
+ <td class="px-6 py-4">5200</td>
213
+ <td class="px-6 py-4"><span class="shadcn-badge bg-red-50 text-red-700">Critical</span></td>
214
+ </tr>
215
+ </tbody>
216
+ </table>
217
+ </div>
218
+ </section>
219
+
220
+ <!-- Reward Design -->
221
+ <section class="mb-32">
222
+ <div class="flex items-center gap-3 mb-8">
223
+ <div class="w-10 h-10 rounded-lg bg-indigo-600 flex items-center justify-center text-white font-bold">3</div>
224
+ <h2 class="text-3xl font-bold tracking-tight">Reward Design: Beyond Scalar Scores</h2>
225
+ </div>
226
+
227
+ <p class="text-slate-600 mb-10 text-lg leading-relaxed">
228
+ Training for scientific rigor requires more than a "Good/Bad" signal. We use a <strong>decomposed reward system</strong> that mixes coarse shaping with sparse terminal bonuses.
229
+ </p>
230
+
231
+ <div class="grid md:grid-cols-3 gap-6 mb-12">
232
+ <div class="shadcn-card p-6 bg-slate-50 border-t-4 border-t-indigo-500">
233
+ <h4 class="font-bold mb-2">Coarse Shaping</h4>
234
+ <p class="text-xs text-slate-500">Edit feedback avoids exact hidden deltas, forcing the model to rely on empirical assays.</p>
235
  </div>
236
+ <div class="shadcn-card p-6 bg-slate-50 border-t-4 border-t-emerald-500">
237
+ <h4 class="font-bold mb-2">Evidence Multipliers</h4>
238
+ <p class="text-xs text-slate-500">Submissions without current potency, toxicity, and synthesis support receive massive penalties.</p>
239
  </div>
240
+ <div class="shadcn-card p-6 bg-slate-50 border-t-4 border-t-orange-500">
241
+ <h4 class="font-bold mb-2">Budget Efficiency</h4>
242
+ <p class="text-xs text-slate-500">Small credits for valid evidence-backed submissions that use less than the allocated budget.</p>
243
  </div>
244
  </div>
 
245
 
246
+ <div class="p-6 bg-indigo-50 border border-indigo-100 rounded-xl text-sm">
247
+ <p class="font-bold text-indigo-700 mb-2">The Curriculum Mode Advantage</p>
248
+ <p class="text-indigo-900 leading-relaxed">
249
+ For early RL, we add <strong>"Partial Credit Breadcrumbs"</strong>. If a model fails to submit but showed good scientific behavior (gathering evidence, designing promising molecules), it receives bounded warmup rewards. This solves the sparse reward problem and teaches the model how to explore before it discovers the terminal submission bonus.
 
 
250
  </p>
251
  </div>
252
+ </section>
253
+
254
+ <!-- Results -->
255
+ <section class="mb-32">
256
+ <div class="flex items-center gap-3 mb-8">
257
+ <div class="w-10 h-10 rounded-lg bg-indigo-600 flex items-center justify-center text-white font-bold">4</div>
258
+ <h2 class="text-3xl font-bold tracking-tight">Training Results</h2>
259
+ </div>
260
 
261
  <div class="shadcn-card overflow-hidden mb-12">
262
  <table class="w-full text-left text-sm">
263
+ <thead class="bg-slate-900 text-white">
264
  <tr>
265
+ <th class="px-6 py-4 font-semibold uppercase tracking-wider text-[10px]">Difficulty</th>
266
+ <th class="px-6 py-4 font-semibold uppercase tracking-wider text-[10px]">Before (SFT)</th>
267
+ <th class="px-6 py-4 font-semibold uppercase tracking-wider text-[10px]">After (RL)</th>
268
+ <th class="px-6 py-4 font-semibold uppercase tracking-wider text-[10px]">Improvement</th>
269
  </tr>
270
  </thead>
271
  <tbody class="divide-y">
272
  <tr>
273
+ <td class="px-6 py-5 font-bold">Level 0: Easy</td>
274
+ <td class="px-6 py-5 text-slate-400">0.1167</td>
275
+ <td class="px-6 py-5 font-extrabold text-slate-900">0.1295</td>
276
+ <td class="px-6 py-5 text-emerald-600 font-black">+10.9%</td>
277
  </tr>
278
  <tr>
279
+ <td class="px-6 py-5 font-bold">Level 1: Medium</td>
280
+ <td class="px-6 py-5 text-slate-400">0.1167</td>
281
+ <td class="px-6 py-5 font-extrabold text-slate-900">0.1278</td>
282
+ <td class="px-6 py-5 text-emerald-600 font-black">+9.5%</td>
283
  </tr>
284
  <tr>
285
+ <td class="px-6 py-5 font-bold">Level 2: Hard</td>
286
+ <td class="px-6 py-5 text-slate-400">0.0800</td>
287
+ <td class="px-6 py-5 font-extrabold text-slate-900">0.0866</td>
288
+ <td class="px-6 py-5 text-emerald-600 font-black">+8.3%</td>
289
  </tr>
290
  </tbody>
291
  </table>
292
  </div>
293
 
294
+ <div class="grid md:grid-cols-2 gap-8">
295
+ <div class="shadcn-card p-6">
296
+ <img src="assets/reward_curve.png" alt="Reward Curve" class="rounded border mb-4">
297
+ <p class="text-xs font-bold text-slate-400 uppercase tracking-widest text-center">RL Training Progression</p>
298
  </div>
299
+ <div class="shadcn-card p-6">
300
+ <img src="assets/Logs.png" alt="Logs" class="rounded border mb-4">
301
+ <p class="text-xs font-bold text-slate-400 uppercase tracking-widest text-center">Governance Action History</p>
302
  </div>
303
  </div>
304
  </section>
305
 
306
  <!-- Final Takeaway -->
307
+ <section class="mb-32 pt-20 border-t text-center">
308
+ <h2 class="text-4xl font-black mb-6 tracking-tight">Final Takeaway</h2>
309
+ <p class="text-slate-500 max-w-2xl mx-auto mb-12 text-lg leading-relaxed">
310
+ MolForge proves that scientific AI should not be built as a single-shot generator. By grounding the LLM in a <strong>closed-loop scientific environment</strong>, we can train models that respect budget, coordinate with specialists, and base their discoveries on verifiable evidence.
311
  </p>
312
  <div class="flex flex-wrap justify-center gap-4">
313
+ <a href="https://github.com/Adhitya-Vardhan/molt_lab" class="px-10 py-4 bg-slate-900 text-white font-bold rounded-xl hover:bg-slate-800 transition-all shadow-xl hover:-translate-y-1">Explore Code</a>
314
+ <a href="https://colab.research.google.com/drive/1c6npGkGNbbbd8XFNeS6zInBpopLnJ4W4?usp=sharing" class="px-10 py-4 bg-white border-2 border-slate-900 text-slate-900 font-bold rounded-xl hover:bg-slate-50 transition-all hover:-translate-y-1">Run Notebook</a>
315
+ </div>
316
+
317
+ <div class="mt-16 flex justify-center gap-8 text-sm font-bold text-slate-400">
318
+ <a href="https://huggingface.co/spaces/Adhitya122/molforge" class="hover:text-indigo-600 transition-colors uppercase tracking-widest">Space Deployment</a>
319
+ <a href="https://huggingface.co/Adhitya122/molforge-grpo-oncology" class="hover:text-indigo-600 transition-colors uppercase tracking-widest">Model Card</a>
320
  </div>
321
  </section>
322
 
323
  <!-- Footer -->
324
+ <footer class="py-12 border-t text-xs text-slate-400 text-center">
325
+ <p>© 2026 MolForge Project • Built for the OpenEnv Hackathon</p>
 
 
 
 
326
  </footer>
327
  </main>
328