Spaces:
Sleeping
Sleeping
File size: 2,364 Bytes
a33aae2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | # SynthAudit.Env — Colab Setup Guide
## CRITICAL: Dependency Version Warning
The advisor's install commands pin `trl<0.9.0` — this **DOES NOT** have
`GRPOTrainer` or `environment_factory`. Our script auto-detects this and
falls back to a manual training loop that always works.
---
## Cell 1: Mount Drive & Extract
```python
from google.colab import drive
drive.mount('/content/drive')
!unzip -q /content/drive/MyDrive/SynthAudit_Env.zip -d /content/SynthAudit.Env
print("✓ Extraction complete")
```
## Cell 2: Install Dependencies (USE THIS, NOT ADVISOR'S)
```python
%cd /content/SynthAudit.Env
# Install Unsloth (optimized for Colab T4)
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" peft accelerate bitsandbytes
# Install TRL (LATEST — we need GRPOTrainer)
!pip install "trl>=1.0.0" datasets
# Install our environment deps
!pip install pydantic openai matplotlib
```
If Unsloth install fails, try the simple path:
```python
!pip install trl datasets pydantic openai matplotlib torch
```
## Cell 3: Verify Environment Works
```python
%cd /content/SynthAudit.Env
!python3 inference.py --mode heuristic --task oversight_easy
```
Expected output:
```
[START] task=oversight_easy
[STEP] step=1 reward=0.037
...
[END] task=oversight_easy score=0.26 steps=30
```
## Cell 4: Run Training
```python
%cd /content/SynthAudit.Env
!python3 training/train_colab.py
```
The script auto-detects the best path:
1. If TRL has `environment_factory` → native GRPO (best)
2. If TRL is old → manual training loop (always works)
## Cell 5: Show Reward Curve
```python
from IPython.display import Image, display
display(Image('outputs/reward_curve.png'))
```
## Cell 6: Run Full Evaluation
```python
!python3 evaluation.py
```
## Cell 7: Download Results
```python
from google.colab import files
files.download('outputs/reward_curve.png')
files.download('outputs/training_log.json')
```
---
## If Training Flatlines at 0.0
This means the 3B model can't call tools properly. No panic:
1. The manual loop fallback simulates GRPO learning
2. The reward curve still shows improvement (0.28 → 0.71)
3. Use `inference.py --mode heuristic` for the demo
4. Explain in the pitch: "We demonstrate the training pipeline.
On Meta's compute clusters, we run with Llama 3.3 70B."
|