Spaces:
Sleeping
Sleeping
Add Colab link: Blog.md
Browse files
Blog.md
CHANGED
|
@@ -258,7 +258,9 @@ The key difference: we don't test whether a model *knows* medicine. We test whet
|
|
| 258 |
|
| 259 |
## Training Validation: Proof the Environment Works
|
| 260 |
|
| 261 |
-
The environment is the contribution. Training is the proof it works. We validated with GRPO on a free Colab T4 β Qwen2.5-3B-Instruct, 4-bit QLoRA, 200 steps
|
|
|
|
|
|
|
| 262 |
|
| 263 |
### The Reward Curve
|
| 264 |
|
|
@@ -327,6 +329,7 @@ python inference.py --mode heuristic # No GPU needed
|
|
| 327 |
- π¦ [GitHub](https://github.com/sumitsaraswat362/SynthAudit.Env)
|
| 328 |
- π€ [Trained Model](https://huggingface.co/Timusgeorge/SynthAudit-Qwen2.5-3B-GRPO)
|
| 329 |
- π¬ [Interactive Dashboard](https://huggingface.co/spaces/Timusgeorge/SynthAudit-Env)
|
|
|
|
| 330 |
|
| 331 |
**Raw Data** (verify every claim):
|
| 332 |
- [`training_log_200.json`](https://huggingface.co/spaces/Timusgeorge/SynthAudit-Env/blob/main/outputs/training_log_200.json) β all 200 reward values
|
|
|
|
| 258 |
|
| 259 |
## Training Validation: Proof the Environment Works
|
| 260 |
|
| 261 |
+
The environment is the contribution. Training is the proof it works. We validated with GRPO on a free Colab T4 β Qwen2.5-3B-Instruct, 4-bit QLoRA, 200 steps.
|
| 262 |
+
|
| 263 |
+
**π [Open the Training Notebook in Colab](https://colab.research.google.com/drive/13H5L6bjg-wYvDFkXamO7_hms5MN8E8s3?usp=share_link)** β judges can re-run the full training pipeline.
|
| 264 |
|
| 265 |
### The Reward Curve
|
| 266 |
|
|
|
|
| 329 |
- π¦ [GitHub](https://github.com/sumitsaraswat362/SynthAudit.Env)
|
| 330 |
- π€ [Trained Model](https://huggingface.co/Timusgeorge/SynthAudit-Qwen2.5-3B-GRPO)
|
| 331 |
- π¬ [Interactive Dashboard](https://huggingface.co/spaces/Timusgeorge/SynthAudit-Env)
|
| 332 |
+
- π [Colab Training Notebook](https://colab.research.google.com/drive/13H5L6bjg-wYvDFkXamO7_hms5MN8E8s3?usp=share_link)
|
| 333 |
|
| 334 |
**Raw Data** (verify every claim):
|
| 335 |
- [`training_log_200.json`](https://huggingface.co/spaces/Timusgeorge/SynthAudit-Env/blob/main/outputs/training_log_200.json) β all 200 reward values
|