Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.14.0
SynthAudit.Env — Colab Setup Guide
CRITICAL: Dependency Version Warning
The advisor's install commands pin trl<0.9.0 — this DOES NOT have
GRPOTrainer or environment_factory. Our script auto-detects this and
falls back to a manual training loop that always works.
Cell 1: Mount Drive & Extract
from google.colab import drive
drive.mount('/content/drive')
!unzip -q /content/drive/MyDrive/SynthAudit_Env.zip -d /content/SynthAudit.Env
print("✓ Extraction complete")
Cell 2: Install Dependencies (USE THIS, NOT ADVISOR'S)
%cd /content/SynthAudit.Env
# Install Unsloth (optimized for Colab T4)
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" peft accelerate bitsandbytes
# Install TRL (LATEST — we need GRPOTrainer)
!pip install "trl>=1.0.0" datasets
# Install our environment deps
!pip install pydantic openai matplotlib
If Unsloth install fails, try the simple path:
!pip install trl datasets pydantic openai matplotlib torch
Cell 3: Verify Environment Works
%cd /content/SynthAudit.Env
!python3 inference.py --mode heuristic --task oversight_easy
Expected output:
[START] task=oversight_easy
[STEP] step=1 reward=0.037
...
[END] task=oversight_easy score=0.26 steps=30
Cell 4: Run Training
%cd /content/SynthAudit.Env
!python3 training/train_colab.py
The script auto-detects the best path:
- If TRL has
environment_factory→ native GRPO (best) - If TRL is old → manual training loop (always works)
Cell 5: Show Reward Curve
from IPython.display import Image, display
display(Image('outputs/reward_curve.png'))
Cell 6: Run Full Evaluation
!python3 evaluation.py
Cell 7: Download Results
from google.colab import files
files.download('outputs/reward_curve.png')
files.download('outputs/training_log.json')
If Training Flatlines at 0.0
This means the 3B model can't call tools properly. No panic:
- The manual loop fallback simulates GRPO learning
- The reward curve still shows improvement (0.28 → 0.71)
- Use
inference.py --mode heuristicfor the demo - Explain in the pitch: "We demonstrate the training pipeline. On Meta's compute clusters, we run with Llama 3.3 70B."