Timusgeorge commited on
Commit
890cf70
Β·
verified Β·
1 Parent(s): f64ec40

Add Colab link: Blog.md

Browse files
Files changed (1) hide show
  1. Blog.md +4 -1
Blog.md CHANGED
@@ -258,7 +258,9 @@ The key difference: we don't test whether a model *knows* medicine. We test whet
258
 
259
  ## Training Validation: Proof the Environment Works
260
 
261
- The environment is the contribution. Training is the proof it works. We validated with GRPO on a free Colab T4 β€” Qwen2.5-3B-Instruct, 4-bit QLoRA, 200 steps, zero cost.
 
 
262
 
263
  ### The Reward Curve
264
 
@@ -327,6 +329,7 @@ python inference.py --mode heuristic # No GPU needed
327
  - πŸ“¦ [GitHub](https://github.com/sumitsaraswat362/SynthAudit.Env)
328
  - πŸ€— [Trained Model](https://huggingface.co/Timusgeorge/SynthAudit-Qwen2.5-3B-GRPO)
329
  - πŸ”¬ [Interactive Dashboard](https://huggingface.co/spaces/Timusgeorge/SynthAudit-Env)
 
330
 
331
  **Raw Data** (verify every claim):
332
  - [`training_log_200.json`](https://huggingface.co/spaces/Timusgeorge/SynthAudit-Env/blob/main/outputs/training_log_200.json) β€” all 200 reward values
 
258
 
259
  ## Training Validation: Proof the Environment Works
260
 
261
+ The environment is the contribution. Training is the proof it works. We validated with GRPO on a free Colab T4 β€” Qwen2.5-3B-Instruct, 4-bit QLoRA, 200 steps.
262
+
263
+ **πŸš€ [Open the Training Notebook in Colab](https://colab.research.google.com/drive/13H5L6bjg-wYvDFkXamO7_hms5MN8E8s3?usp=share_link)** β€” judges can re-run the full training pipeline.
264
 
265
  ### The Reward Curve
266
 
 
329
  - πŸ“¦ [GitHub](https://github.com/sumitsaraswat362/SynthAudit.Env)
330
  - πŸ€— [Trained Model](https://huggingface.co/Timusgeorge/SynthAudit-Qwen2.5-3B-GRPO)
331
  - πŸ”¬ [Interactive Dashboard](https://huggingface.co/spaces/Timusgeorge/SynthAudit-Env)
332
+ - πŸš€ [Colab Training Notebook](https://colab.research.google.com/drive/13H5L6bjg-wYvDFkXamO7_hms5MN8E8s3?usp=share_link)
333
 
334
  **Raw Data** (verify every claim):
335
  - [`training_log_200.json`](https://huggingface.co/spaces/Timusgeorge/SynthAudit-Env/blob/main/outputs/training_log_200.json) β€” all 200 reward values