Spaces:
Running
Running
Upload folder using huggingface_hub
Browse files- README.md +4 -0
- index.html +211 -113
README.md
CHANGED
|
@@ -23,8 +23,12 @@ MolForge is a reinforcement learning environment that simulates a **medical onco
|
|
| 23 |
**[View the MolForge Space Deployment on Hugging Face](https://huggingface.co/spaces/Adhitya122/molforge)**
|
| 24 |
**[Try the RL Training Notebook on Google Colab](https://colab.research.google.com/drive/1c6npGkGNbbbd8XFNeS6zInBpopLnJ4W4?usp=sharing)**
|
| 25 |
|
|
|
|
|
|
|
|
|
|
| 26 |
### The Scientific Method as a Workflow
|
| 27 |
|
|
|
|
| 28 |
Imagine a biotech team tasked with optimizing a lead candidate for **KRAS G12C** (including a high-difficulty resistance panel). The model doesn't just "write" a molecule; it controls a specialist team that must navigate a resource-constrained laboratory:
|
| 29 |
|
| 30 |
- **Lead Chemist**: Proposes molecular edits and decides when to submit.
|
|
|
|
| 23 |
**[View the MolForge Space Deployment on Hugging Face](https://huggingface.co/spaces/Adhitya122/molforge)**
|
| 24 |
**[Try the RL Training Notebook on Google Colab](https://colab.research.google.com/drive/1c6npGkGNbbbd8XFNeS6zInBpopLnJ4W4?usp=sharing)**
|
| 25 |
|
| 26 |
+
### 🎥 Explainer Video
|
| 27 |
+
[](https://youtu.be/q8YoA0YhIn8)
|
| 28 |
+
|
| 29 |
### The Scientific Method as a Workflow
|
| 30 |
|
| 31 |
+
|
| 32 |
Imagine a biotech team tasked with optimizing a lead candidate for **KRAS G12C** (including a high-difficulty resistance panel). The model doesn't just "write" a molecule; it controls a specialist team that must navigate a resource-constrained laboratory:
|
| 33 |
|
| 34 |
- **Lead Chemist**: Proposes molecular edits and decides when to submit.
|
index.html
CHANGED
|
@@ -14,13 +14,13 @@
|
|
| 14 |
color: #0f172a;
|
| 15 |
}
|
| 16 |
.prose-custom {
|
| 17 |
-
max-width:
|
| 18 |
margin: 0 auto;
|
| 19 |
}
|
| 20 |
.shadcn-card {
|
| 21 |
border: 1px solid #e2e8f0;
|
| 22 |
background: #ffffff;
|
| 23 |
-
border-radius: 0.
|
| 24 |
box-shadow: 0 1px 3px 0 rgb(0 0 0 / 0.1), 0 1px 2px -1px rgb(0 0 0 / 0.1);
|
| 25 |
}
|
| 26 |
.shadcn-badge {
|
|
@@ -36,195 +36,293 @@
|
|
| 36 |
.mono {
|
| 37 |
font-family: 'JetBrains Mono', monospace;
|
| 38 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
</style>
|
| 40 |
</head>
|
| 41 |
<body class="bg-white">
|
| 42 |
|
| 43 |
<!-- Navigation -->
|
| 44 |
<nav class="border-b sticky top-0 bg-white/80 backdrop-blur-md z-50">
|
| 45 |
-
<div class="max-w-
|
| 46 |
<span class="font-bold tracking-tight text-lg">MolForge</span>
|
| 47 |
<div class="flex gap-6 text-sm font-medium text-slate-600">
|
| 48 |
<a href="https://github.com/Adhitya-Vardhan/molt_lab" class="hover:text-black transition-colors">GitHub</a>
|
| 49 |
-
<a href="https://
|
|
|
|
| 50 |
</div>
|
| 51 |
</div>
|
| 52 |
</nav>
|
| 53 |
|
| 54 |
-
<main class="max-w-
|
| 55 |
<!-- Header -->
|
| 56 |
-
<div class="mb-16">
|
| 57 |
-
<div class="flex gap-2 mb-6">
|
| 58 |
-
<span class="shadcn-badge bg-indigo-50 text-indigo-700">
|
| 59 |
-
<span class="shadcn-badge">
|
| 60 |
</div>
|
| 61 |
-
<h1 class="text-4xl md:text-
|
| 62 |
-
MolForge:
|
| 63 |
</h1>
|
| 64 |
-
<p class="text-xl text-slate-500 mb-
|
| 65 |
-
|
| 66 |
</p>
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
<div>
|
| 70 |
-
|
| 71 |
-
<p
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
</div>
|
|
|
|
| 73 |
</div>
|
| 74 |
</div>
|
| 75 |
|
| 76 |
<!-- Introduction -->
|
| 77 |
-
<div class="prose prose-slate prose-lg max-w-none mb-
|
|
|
|
| 78 |
<p>
|
| 79 |
-
|
| 80 |
</p>
|
| 81 |
<p class="mt-4">
|
| 82 |
-
<strong>MolForge</strong> is a reinforcement learning environment that
|
| 83 |
</p>
|
| 84 |
|
| 85 |
-
<div class="
|
| 86 |
-
<p class="text-indigo-
|
| 87 |
-
<p class="
|
| 88 |
-
"The
|
| 89 |
</p>
|
| 90 |
</div>
|
| 91 |
</div>
|
| 92 |
|
| 93 |
-
<!-- Architecture
|
| 94 |
-
<section class="mb-
|
| 95 |
-
<
|
| 96 |
-
<
|
| 97 |
-
|
| 98 |
-
</
|
| 99 |
-
<p class="text-slate-600 mb-8 text-lg">
|
| 100 |
-
The environment is designed as a **POMDP (Partially Observable Markov Decision Process)**. This separation between what is true and what is visible is what makes the environment a scientific challenge.
|
| 101 |
-
</p>
|
| 102 |
|
| 103 |
-
<
|
| 104 |
-
<
|
| 105 |
-
|
| 106 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
</div>
|
| 108 |
-
<div class="
|
| 109 |
-
<
|
| 110 |
-
<
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
</div>
|
| 112 |
</div>
|
|
|
|
| 113 |
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
|
|
|
|
|
|
| 117 |
</div>
|
| 118 |
-
</section>
|
| 119 |
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
<
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
<p class="text-
|
|
|
|
| 127 |
</div>
|
| 128 |
-
<div class="shadcn-card p-
|
| 129 |
-
<
|
| 130 |
-
<p class="text-
|
| 131 |
</div>
|
| 132 |
-
<div class="shadcn-card p-
|
| 133 |
-
<
|
| 134 |
-
<p class="text-
|
| 135 |
</div>
|
| 136 |
-
<div class="shadcn-card p-
|
| 137 |
-
<
|
| 138 |
-
<p class="text-
|
| 139 |
</div>
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 143 |
</div>
|
| 144 |
-
<div class="shadcn-card p-6
|
| 145 |
-
<h4 class="font-bold
|
| 146 |
-
<p class="text-
|
| 147 |
</div>
|
| 148 |
-
<div class="shadcn-card p-6
|
| 149 |
-
<h4 class="font-bold
|
| 150 |
-
<p class="text-
|
| 151 |
</div>
|
| 152 |
</div>
|
| 153 |
-
</section>
|
| 154 |
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
<p>
|
| 160 |
-
We trained a **Qwen3.5-2B** model using GRPO against the MolForge environment. By transitioning from a simple SFT baseline to a verifier-driven RL policy, we saw significant improvements across all difficulty levels.
|
| 161 |
</p>
|
| 162 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
|
| 164 |
<div class="shadcn-card overflow-hidden mb-12">
|
| 165 |
<table class="w-full text-left text-sm">
|
| 166 |
-
<thead class="bg-slate-
|
| 167 |
<tr>
|
| 168 |
-
<th class="px-6 py-4 font-semibold text-
|
| 169 |
-
<th class="px-6 py-4 font-semibold text-
|
| 170 |
-
<th class="px-6 py-4 font-semibold text-
|
| 171 |
-
<th class="px-6 py-4 font-semibold text-
|
| 172 |
</tr>
|
| 173 |
</thead>
|
| 174 |
<tbody class="divide-y">
|
| 175 |
<tr>
|
| 176 |
-
<td class="px-6 py-
|
| 177 |
-
<td class="px-6 py-
|
| 178 |
-
<td class="px-6 py-
|
| 179 |
-
<td class="px-6 py-
|
| 180 |
</tr>
|
| 181 |
<tr>
|
| 182 |
-
<td class="px-6 py-
|
| 183 |
-
<td class="px-6 py-
|
| 184 |
-
<td class="px-6 py-
|
| 185 |
-
<td class="px-6 py-
|
| 186 |
</tr>
|
| 187 |
<tr>
|
| 188 |
-
<td class="px-6 py-
|
| 189 |
-
<td class="px-6 py-
|
| 190 |
-
<td class="px-6 py-
|
| 191 |
-
<td class="px-6 py-
|
| 192 |
</tr>
|
| 193 |
</tbody>
|
| 194 |
</table>
|
| 195 |
</div>
|
| 196 |
|
| 197 |
-
<div class="grid md:grid-cols-2 gap-8
|
| 198 |
-
<div class="shadcn-card p-
|
| 199 |
-
<img src="assets/reward_curve.png" alt="Reward Curve" class="rounded border">
|
| 200 |
-
<p class="
|
| 201 |
</div>
|
| 202 |
-
<div class="shadcn-card p-
|
| 203 |
-
<img src="assets/Logs.png" alt="Logs" class="rounded border">
|
| 204 |
-
<p class="
|
| 205 |
</div>
|
| 206 |
</div>
|
| 207 |
</section>
|
| 208 |
|
| 209 |
<!-- Final Takeaway -->
|
| 210 |
-
<section class="mb-
|
| 211 |
-
<h2 class="text-
|
| 212 |
-
<p class="text-slate-500 max-w-
|
| 213 |
-
MolForge
|
| 214 |
</p>
|
| 215 |
<div class="flex flex-wrap justify-center gap-4">
|
| 216 |
-
<a href="https://github.com/Adhitya-Vardhan/molt_lab" class="px-
|
| 217 |
-
<a href="https://colab.research.google.com/drive/1c6npGkGNbbbd8XFNeS6zInBpopLnJ4W4?usp=sharing" class="px-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 218 |
</div>
|
| 219 |
</section>
|
| 220 |
|
| 221 |
<!-- Footer -->
|
| 222 |
-
<footer class="py-12 border-t text-
|
| 223 |
-
<p>© 2026 MolForge • Built for OpenEnv</p>
|
| 224 |
-
<div class="flex gap-6">
|
| 225 |
-
<a href="https://huggingface.co/Adhitya122/molforge-grpo-oncology" class="hover:text-slate-600">Model Card</a>
|
| 226 |
-
<a href="https://huggingface.co/spaces/Adhitya122/molforge" class="hover:text-slate-600">Space Home</a>
|
| 227 |
-
</div>
|
| 228 |
</footer>
|
| 229 |
</main>
|
| 230 |
|
|
|
|
| 14 |
color: #0f172a;
|
| 15 |
}
|
| 16 |
.prose-custom {
|
| 17 |
+
max-width: 70ch;
|
| 18 |
margin: 0 auto;
|
| 19 |
}
|
| 20 |
.shadcn-card {
|
| 21 |
border: 1px solid #e2e8f0;
|
| 22 |
background: #ffffff;
|
| 23 |
+
border-radius: 0.75rem;
|
| 24 |
box-shadow: 0 1px 3px 0 rgb(0 0 0 / 0.1), 0 1px 2px -1px rgb(0 0 0 / 0.1);
|
| 25 |
}
|
| 26 |
.shadcn-badge {
|
|
|
|
| 36 |
.mono {
|
| 37 |
font-family: 'JetBrains Mono', monospace;
|
| 38 |
}
|
| 39 |
+
.video-container {
|
| 40 |
+
position: relative;
|
| 41 |
+
padding-bottom: 56.25%;
|
| 42 |
+
height: 0;
|
| 43 |
+
overflow: hidden;
|
| 44 |
+
border-radius: 0.75rem;
|
| 45 |
+
border: 1px solid #e2e8f0;
|
| 46 |
+
}
|
| 47 |
+
.video-container iframe {
|
| 48 |
+
position: absolute;
|
| 49 |
+
top: 0;
|
| 50 |
+
left: 0;
|
| 51 |
+
width: 100%;
|
| 52 |
+
height: 100%;
|
| 53 |
+
}
|
| 54 |
</style>
|
| 55 |
</head>
|
| 56 |
<body class="bg-white">
|
| 57 |
|
| 58 |
<!-- Navigation -->
|
| 59 |
<nav class="border-b sticky top-0 bg-white/80 backdrop-blur-md z-50">
|
| 60 |
+
<div class="max-w-5xl mx-auto px-6 h-16 flex items-center justify-between">
|
| 61 |
<span class="font-bold tracking-tight text-lg">MolForge</span>
|
| 62 |
<div class="flex gap-6 text-sm font-medium text-slate-600">
|
| 63 |
<a href="https://github.com/Adhitya-Vardhan/molt_lab" class="hover:text-black transition-colors">GitHub</a>
|
| 64 |
+
<a href="https://huggingface.co/spaces/Adhitya122/molforge" class="hover:text-black transition-colors">Space</a>
|
| 65 |
+
<a href="https://colab.research.google.com/drive/1c6npGkGNbbbd8XFNeS6zInBpopLnJ4W4?usp=sharing" class="hover:text-black transition-colors font-bold text-indigo-600">Try Training</a>
|
| 66 |
</div>
|
| 67 |
</div>
|
| 68 |
</nav>
|
| 69 |
|
| 70 |
+
<main class="max-w-5xl mx-auto px-6 py-20">
|
| 71 |
<!-- Header -->
|
| 72 |
+
<div class="mb-16 text-center">
|
| 73 |
+
<div class="flex justify-center gap-2 mb-6">
|
| 74 |
+
<span class="shadcn-badge bg-indigo-50 text-indigo-700 border border-indigo-100">Hackathon Submission</span>
|
| 75 |
+
<span class="shadcn-badge">Medical Oncology</span>
|
| 76 |
</div>
|
| 77 |
+
<h1 class="text-4xl md:text-6xl font-extrabold tracking-tight mb-6 leading-tight max-w-3xl mx-auto">
|
| 78 |
+
MolForge: The Scientific Method as a Workflow
|
| 79 |
</h1>
|
| 80 |
+
<p class="text-xl text-slate-500 mb-10 leading-relaxed max-w-2xl mx-auto">
|
| 81 |
+
How we trained an LLM to navigate a resource-constrained laboratory, optimize oncology drug candidates, and survive scientific "sunk-cost" traps.
|
| 82 |
</p>
|
| 83 |
+
|
| 84 |
+
<div class="flex items-center justify-center gap-4 text-sm text-slate-400 mb-12">
|
| 85 |
+
<div class="w-10 h-10 rounded-full bg-indigo-100 flex items-center justify-center text-indigo-600 font-bold">AV</div>
|
| 86 |
+
<div class="text-left">
|
| 87 |
+
<p class="font-semibold text-slate-900 leading-none mb-1">Adhitya Vardhan</p>
|
| 88 |
+
<p class="leading-none">OpenEnv Hackathon 2026 • 12 min read</p>
|
| 89 |
+
</div>
|
| 90 |
+
</div>
|
| 91 |
+
|
| 92 |
+
<!-- Video Section -->
|
| 93 |
+
<div class="max-w-3xl mx-auto mb-20">
|
| 94 |
+
<div class="video-container shadow-2xl">
|
| 95 |
+
<iframe src="https://www.youtube.com/embed/q8YoA0YhIn8" title="MolForge Explainer Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
|
| 96 |
</div>
|
| 97 |
+
<p class="mt-4 text-sm text-slate-400 italic">Watch the MolForge technical explainer (3:24)</p>
|
| 98 |
</div>
|
| 99 |
</div>
|
| 100 |
|
| 101 |
<!-- Introduction -->
|
| 102 |
+
<div class="prose prose-slate prose-lg max-w-none mb-24 leading-relaxed text-slate-700 border-b pb-20">
|
| 103 |
+
<h2 class="text-3xl font-bold text-slate-900 mb-6">Introduction</h2>
|
| 104 |
<p>
|
| 105 |
+
The challenge of modern AI in drug discovery is often reduced to a static prediction task: "Does this molecule bind to this target?" But in a real-world biotech lab, the answer is never that simple. A lead chemist must balance <strong>potency</strong> against <strong>toxicity</strong>, <strong>synthesizability</strong>, and a rapidly depleting <strong>assay budget</strong>.
|
| 106 |
</p>
|
| 107 |
<p class="mt-4">
|
| 108 |
+
<strong>MolForge</strong> is a verifier-driven reinforcement learning environment that replicates this complexity. We don't trust the model to "guess" the answer. Instead, we force it to execute a sequence of actions—edits, assays, and specialist reviews—until it can justify a final submission with verifiable evidence.
|
| 109 |
</p>
|
| 110 |
|
| 111 |
+
<div class="mt-12 p-8 bg-slate-900 text-slate-200 rounded-2xl shadow-lg">
|
| 112 |
+
<p class="text-indigo-400 font-bold mb-4 uppercase tracking-widest text-xs">Core Philosophy</p>
|
| 113 |
+
<p class="text-2xl font-medium leading-tight">
|
| 114 |
+
"The model is a trainable research agent inside a controlled scientific environment, not an oracle. It is judged by chemistry and biomedical verifiers, corrected by specialist feedback, and scored by a reward system that explains exactly where the path to a discovery failed."
|
| 115 |
</p>
|
| 116 |
</div>
|
| 117 |
</div>
|
| 118 |
|
| 119 |
+
<!-- Scientific Architecture -->
|
| 120 |
+
<section class="mb-32">
|
| 121 |
+
<div class="flex items-center gap-3 mb-8">
|
| 122 |
+
<div class="w-10 h-10 rounded-lg bg-indigo-600 flex items-center justify-center text-white font-bold">1</div>
|
| 123 |
+
<h2 class="text-3xl font-bold tracking-tight">The POMDP Architecture</h2>
|
| 124 |
+
</div>
|
|
|
|
|
|
|
|
|
|
| 125 |
|
| 126 |
+
<p class="text-slate-600 mb-10 text-lg leading-relaxed">
|
| 127 |
+
MolForge is built as a <strong>Partially Observable Markov Decision Process (POMDP)</strong>. This means the agent never sees the "hidden truth" of the receptor. It only sees what its budget allows it to assay.
|
| 128 |
+
</p>
|
| 129 |
+
|
| 130 |
+
<div class="shadcn-card p-4 bg-slate-50 mb-12">
|
| 131 |
+
<img src="assets/molforge_architecture.png" alt="Architecture" class="rounded-lg w-full">
|
| 132 |
+
<p class="mt-4 text-center text-xs text-slate-400 font-medium tracking-wide">THE SCIENTIFIC FEEDBACK LOOP: VERIFIER-FIRST DESIGN</p>
|
| 133 |
+
</div>
|
| 134 |
+
|
| 135 |
+
<div class="grid md:grid-cols-2 gap-8">
|
| 136 |
+
<div class="space-y-4">
|
| 137 |
+
<h4 class="font-bold text-slate-900 border-b pb-2">The Hidden State</h4>
|
| 138 |
+
<ul class="space-y-3 text-slate-600 text-sm">
|
| 139 |
+
<li class="flex gap-2"><span>•</span> <strong>Ground Truth Potency:</strong> The exact hidden binding energy of the KRAS G12C pocket.</li>
|
| 140 |
+
<li class="flex gap-2"><span>•</span> <strong>Sunk-Cost Traps:</strong> Starting scaffolds that look promising but have hidden liabilities.</li>
|
| 141 |
+
<li class="flex gap-2"><span>•</span> <strong>Target Mutation:</strong> Late-stage shifts in the pocket (Level 2) that punish blind optimization.</li>
|
| 142 |
+
</ul>
|
| 143 |
</div>
|
| 144 |
+
<div class="space-y-4">
|
| 145 |
+
<h4 class="font-bold text-indigo-600 border-b pb-2">The Visible Evidence</h4>
|
| 146 |
+
<ul class="space-y-3 text-slate-600 text-sm">
|
| 147 |
+
<li class="flex gap-2"><span>•</span> <strong>RDKit & TDC Signals:</strong> Noisy, verifier-backed readings of Lipophilicity (LogP) and TPSA.</li>
|
| 148 |
+
<li class="flex gap-2"><span>•</span> <strong>Heuristic Docking:</strong> Fast simulations of pocket matching and receptor fit.</li>
|
| 149 |
+
<li class="flex gap-2"><span>•</span> <strong>Governance Vetoes:</strong> Objections from the Safety Specialist or Process Chemist.</li>
|
| 150 |
+
</ul>
|
| 151 |
</div>
|
| 152 |
</div>
|
| 153 |
+
</section>
|
| 154 |
|
| 155 |
+
<!-- Search Space & Scenarios -->
|
| 156 |
+
<section class="mb-32">
|
| 157 |
+
<div class="flex items-center gap-3 mb-8">
|
| 158 |
+
<div class="w-10 h-10 rounded-lg bg-indigo-600 flex items-center justify-center text-white font-bold">2</div>
|
| 159 |
+
<h2 class="text-3xl font-bold tracking-tight">The Molecular Search Space</h2>
|
| 160 |
</div>
|
|
|
|
| 161 |
|
| 162 |
+
<p class="text-slate-600 mb-10 text-lg leading-relaxed">
|
| 163 |
+
We don't ask the model to memorize molecules. We ask it to navigate a <strong>combinatorial space of 256 fragments</strong> across three starting scenarios.
|
| 164 |
+
</p>
|
| 165 |
+
|
| 166 |
+
<div class="grid grid-cols-2 md:grid-cols-4 gap-4 mb-12">
|
| 167 |
+
<div class="shadcn-card p-4 text-center">
|
| 168 |
+
<p class="text-xs font-bold text-slate-400 uppercase mb-2">Warhead</p>
|
| 169 |
+
<p class="text-sm font-semibold">4 Options</p>
|
| 170 |
</div>
|
| 171 |
+
<div class="shadcn-card p-4 text-center">
|
| 172 |
+
<p class="text-xs font-bold text-slate-400 uppercase mb-2">Hinge</p>
|
| 173 |
+
<p class="text-sm font-semibold">4 Options</p>
|
| 174 |
</div>
|
| 175 |
+
<div class="shadcn-card p-4 text-center">
|
| 176 |
+
<p class="text-xs font-bold text-slate-400 uppercase mb-2">Solvent Tail</p>
|
| 177 |
+
<p class="text-sm font-semibold">4 Options</p>
|
| 178 |
</div>
|
| 179 |
+
<div class="shadcn-card p-4 text-center">
|
| 180 |
+
<p class="text-xs font-bold text-slate-400 uppercase mb-2">Back Pocket</p>
|
| 181 |
+
<p class="text-sm font-semibold">4 Options</p>
|
| 182 |
</div>
|
| 183 |
+
</div>
|
| 184 |
+
|
| 185 |
+
<h3 class="text-xl font-bold mb-6">Benchmark Scenarios</h3>
|
| 186 |
+
<div class="shadcn-card overflow-hidden mb-12">
|
| 187 |
+
<table class="w-full text-left text-sm">
|
| 188 |
+
<thead class="bg-slate-50 border-b">
|
| 189 |
+
<tr>
|
| 190 |
+
<th class="px-6 py-4 font-semibold text-slate-900">Scenario</th>
|
| 191 |
+
<th class="px-6 py-4 font-semibold text-slate-900">Story</th>
|
| 192 |
+
<th class="px-6 py-4 font-semibold text-slate-900">Budget</th>
|
| 193 |
+
<th class="px-6 py-4 font-semibold text-slate-900">Difficulty</th>
|
| 194 |
+
</tr>
|
| 195 |
+
</thead>
|
| 196 |
+
<tbody class="divide-y">
|
| 197 |
+
<tr>
|
| 198 |
+
<td class="px-6 py-4 font-bold text-indigo-600">Level 0: Easy</td>
|
| 199 |
+
<td class="px-6 py-4 text-slate-500">Near-viable scaffold needs safety repair and evidence.</td>
|
| 200 |
+
<td class="px-6 py-4">3600</td>
|
| 201 |
+
<td class="px-6 py-4"><span class="shadcn-badge bg-emerald-50 text-emerald-700">Low</span></td>
|
| 202 |
+
</tr>
|
| 203 |
+
<tr>
|
| 204 |
+
<td class="px-6 py-4 font-bold text-indigo-600">Level 1: Medium</td>
|
| 205 |
+
<td class="px-6 py-4 text-slate-500">Potency, toxicity, and synthesis must be balanced.</td>
|
| 206 |
+
<td class="px-6 py-4">4300</td>
|
| 207 |
+
<td class="px-6 py-4"><span class="shadcn-badge bg-orange-50 text-orange-700">Moderate</span></td>
|
| 208 |
+
</tr>
|
| 209 |
+
<tr>
|
| 210 |
+
<td class="px-6 py-4 font-bold text-indigo-600">Level 2: Hard</td>
|
| 211 |
+
<td class="px-6 py-4 text-slate-500">Sunk-cost trap: starting series has hidden liability.</td>
|
| 212 |
+
<td class="px-6 py-4">5200</td>
|
| 213 |
+
<td class="px-6 py-4"><span class="shadcn-badge bg-red-50 text-red-700">Critical</span></td>
|
| 214 |
+
</tr>
|
| 215 |
+
</tbody>
|
| 216 |
+
</table>
|
| 217 |
+
</div>
|
| 218 |
+
</section>
|
| 219 |
+
|
| 220 |
+
<!-- Reward Design -->
|
| 221 |
+
<section class="mb-32">
|
| 222 |
+
<div class="flex items-center gap-3 mb-8">
|
| 223 |
+
<div class="w-10 h-10 rounded-lg bg-indigo-600 flex items-center justify-center text-white font-bold">3</div>
|
| 224 |
+
<h2 class="text-3xl font-bold tracking-tight">Reward Design: Beyond Scalar Scores</h2>
|
| 225 |
+
</div>
|
| 226 |
+
|
| 227 |
+
<p class="text-slate-600 mb-10 text-lg leading-relaxed">
|
| 228 |
+
Training for scientific rigor requires more than a "Good/Bad" signal. We use a <strong>decomposed reward system</strong> that mixes coarse shaping with sparse terminal bonuses.
|
| 229 |
+
</p>
|
| 230 |
+
|
| 231 |
+
<div class="grid md:grid-cols-3 gap-6 mb-12">
|
| 232 |
+
<div class="shadcn-card p-6 bg-slate-50 border-t-4 border-t-indigo-500">
|
| 233 |
+
<h4 class="font-bold mb-2">Coarse Shaping</h4>
|
| 234 |
+
<p class="text-xs text-slate-500">Edit feedback avoids exact hidden deltas, forcing the model to rely on empirical assays.</p>
|
| 235 |
</div>
|
| 236 |
+
<div class="shadcn-card p-6 bg-slate-50 border-t-4 border-t-emerald-500">
|
| 237 |
+
<h4 class="font-bold mb-2">Evidence Multipliers</h4>
|
| 238 |
+
<p class="text-xs text-slate-500">Submissions without current potency, toxicity, and synthesis support receive massive penalties.</p>
|
| 239 |
</div>
|
| 240 |
+
<div class="shadcn-card p-6 bg-slate-50 border-t-4 border-t-orange-500">
|
| 241 |
+
<h4 class="font-bold mb-2">Budget Efficiency</h4>
|
| 242 |
+
<p class="text-xs text-slate-500">Small credits for valid evidence-backed submissions that use less than the allocated budget.</p>
|
| 243 |
</div>
|
| 244 |
</div>
|
|
|
|
| 245 |
|
| 246 |
+
<div class="p-6 bg-indigo-50 border border-indigo-100 rounded-xl text-sm">
|
| 247 |
+
<p class="font-bold text-indigo-700 mb-2">The Curriculum Mode Advantage</p>
|
| 248 |
+
<p class="text-indigo-900 leading-relaxed">
|
| 249 |
+
For early RL, we add <strong>"Partial Credit Breadcrumbs"</strong>. If a model fails to submit but showed good scientific behavior (gathering evidence, designing promising molecules), it receives bounded warmup rewards. This solves the sparse reward problem and teaches the model how to explore before it discovers the terminal submission bonus.
|
|
|
|
|
|
|
| 250 |
</p>
|
| 251 |
</div>
|
| 252 |
+
</section>
|
| 253 |
+
|
| 254 |
+
<!-- Results -->
|
| 255 |
+
<section class="mb-32">
|
| 256 |
+
<div class="flex items-center gap-3 mb-8">
|
| 257 |
+
<div class="w-10 h-10 rounded-lg bg-indigo-600 flex items-center justify-center text-white font-bold">4</div>
|
| 258 |
+
<h2 class="text-3xl font-bold tracking-tight">Training Results</h2>
|
| 259 |
+
</div>
|
| 260 |
|
| 261 |
<div class="shadcn-card overflow-hidden mb-12">
|
| 262 |
<table class="w-full text-left text-sm">
|
| 263 |
+
<thead class="bg-slate-900 text-white">
|
| 264 |
<tr>
|
| 265 |
+
<th class="px-6 py-4 font-semibold uppercase tracking-wider text-[10px]">Difficulty</th>
|
| 266 |
+
<th class="px-6 py-4 font-semibold uppercase tracking-wider text-[10px]">Before (SFT)</th>
|
| 267 |
+
<th class="px-6 py-4 font-semibold uppercase tracking-wider text-[10px]">After (RL)</th>
|
| 268 |
+
<th class="px-6 py-4 font-semibold uppercase tracking-wider text-[10px]">Improvement</th>
|
| 269 |
</tr>
|
| 270 |
</thead>
|
| 271 |
<tbody class="divide-y">
|
| 272 |
<tr>
|
| 273 |
+
<td class="px-6 py-5 font-bold">Level 0: Easy</td>
|
| 274 |
+
<td class="px-6 py-5 text-slate-400">0.1167</td>
|
| 275 |
+
<td class="px-6 py-5 font-extrabold text-slate-900">0.1295</td>
|
| 276 |
+
<td class="px-6 py-5 text-emerald-600 font-black">+10.9%</td>
|
| 277 |
</tr>
|
| 278 |
<tr>
|
| 279 |
+
<td class="px-6 py-5 font-bold">Level 1: Medium</td>
|
| 280 |
+
<td class="px-6 py-5 text-slate-400">0.1167</td>
|
| 281 |
+
<td class="px-6 py-5 font-extrabold text-slate-900">0.1278</td>
|
| 282 |
+
<td class="px-6 py-5 text-emerald-600 font-black">+9.5%</td>
|
| 283 |
</tr>
|
| 284 |
<tr>
|
| 285 |
+
<td class="px-6 py-5 font-bold">Level 2: Hard</td>
|
| 286 |
+
<td class="px-6 py-5 text-slate-400">0.0800</td>
|
| 287 |
+
<td class="px-6 py-5 font-extrabold text-slate-900">0.0866</td>
|
| 288 |
+
<td class="px-6 py-5 text-emerald-600 font-black">+8.3%</td>
|
| 289 |
</tr>
|
| 290 |
</tbody>
|
| 291 |
</table>
|
| 292 |
</div>
|
| 293 |
|
| 294 |
+
<div class="grid md:grid-cols-2 gap-8">
|
| 295 |
+
<div class="shadcn-card p-6">
|
| 296 |
+
<img src="assets/reward_curve.png" alt="Reward Curve" class="rounded border mb-4">
|
| 297 |
+
<p class="text-xs font-bold text-slate-400 uppercase tracking-widest text-center">RL Training Progression</p>
|
| 298 |
</div>
|
| 299 |
+
<div class="shadcn-card p-6">
|
| 300 |
+
<img src="assets/Logs.png" alt="Logs" class="rounded border mb-4">
|
| 301 |
+
<p class="text-xs font-bold text-slate-400 uppercase tracking-widest text-center">Governance Action History</p>
|
| 302 |
</div>
|
| 303 |
</div>
|
| 304 |
</section>
|
| 305 |
|
| 306 |
<!-- Final Takeaway -->
|
| 307 |
+
<section class="mb-32 pt-20 border-t text-center">
|
| 308 |
+
<h2 class="text-4xl font-black mb-6 tracking-tight">Final Takeaway</h2>
|
| 309 |
+
<p class="text-slate-500 max-w-2xl mx-auto mb-12 text-lg leading-relaxed">
|
| 310 |
+
MolForge proves that scientific AI should not be built as a single-shot generator. By grounding the LLM in a <strong>closed-loop scientific environment</strong>, we can train models that respect budget, coordinate with specialists, and base their discoveries on verifiable evidence.
|
| 311 |
</p>
|
| 312 |
<div class="flex flex-wrap justify-center gap-4">
|
| 313 |
+
<a href="https://github.com/Adhitya-Vardhan/molt_lab" class="px-10 py-4 bg-slate-900 text-white font-bold rounded-xl hover:bg-slate-800 transition-all shadow-xl hover:-translate-y-1">Explore Code</a>
|
| 314 |
+
<a href="https://colab.research.google.com/drive/1c6npGkGNbbbd8XFNeS6zInBpopLnJ4W4?usp=sharing" class="px-10 py-4 bg-white border-2 border-slate-900 text-slate-900 font-bold rounded-xl hover:bg-slate-50 transition-all hover:-translate-y-1">Run Notebook</a>
|
| 315 |
+
</div>
|
| 316 |
+
|
| 317 |
+
<div class="mt-16 flex justify-center gap-8 text-sm font-bold text-slate-400">
|
| 318 |
+
<a href="https://huggingface.co/spaces/Adhitya122/molforge" class="hover:text-indigo-600 transition-colors uppercase tracking-widest">Space Deployment</a>
|
| 319 |
+
<a href="https://huggingface.co/Adhitya122/molforge-grpo-oncology" class="hover:text-indigo-600 transition-colors uppercase tracking-widest">Model Card</a>
|
| 320 |
</div>
|
| 321 |
</section>
|
| 322 |
|
| 323 |
<!-- Footer -->
|
| 324 |
+
<footer class="py-12 border-t text-xs text-slate-400 text-center">
|
| 325 |
+
<p>© 2026 MolForge Project • Built for the OpenEnv Hackathon</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
| 326 |
</footer>
|
| 327 |
</main>
|
| 328 |
|