Spaces:
Running
Running
Upload README.md
Browse files
README.md
CHANGED
|
@@ -144,6 +144,29 @@ Dense, shaped signal across the full trajectory — not just binary win/lose:
|
|
| 144 |
- **Episodes:** 50 per task (150 total)
|
| 145 |
- **Hardware:** NVIDIA T4 GPU (Colab)
|
| 146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
### Results
|
| 148 |
|
| 149 |
| Task | First 10 Episodes | Last 10 Episodes | Improvement | Status |
|
|
@@ -293,6 +316,37 @@ python train.py \
|
|
| 293 |
- [x] `/grader` endpoint
|
| 294 |
- [x] HF Space deployed and healthy
|
| 295 |
- [x] Baseline inference script
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 296 |
|
| 297 |
---
|
| 298 |
|
|
|
|
| 144 |
- **Episodes:** 50 per task (150 total)
|
| 145 |
- **Hardware:** NVIDIA T4 GPU (Colab)
|
| 146 |
|
| 147 |
+
### Experimental Tracking
|
| 148 |
+
|
| 149 |
+
Training results are automatically logged and saved to verify the training actually happened:
|
| 150 |
+
|
| 151 |
+
- **`./logs/{task}_results.csv`** — Per-episode rewards and step counts (updated live during training)
|
| 152 |
+
```
|
| 153 |
+
episode,reward,steps
|
| 154 |
+
1,+0.255,8
|
| 155 |
+
2,+0.240,7
|
| 156 |
+
3,+0.290,6
|
| 157 |
+
...
|
| 158 |
+
```
|
| 159 |
+
- **`./phase2_checkpoints/{task}_ep*.json`** — Checkpoint data at episodes 25, 50, 75, etc.
|
| 160 |
+
|
| 161 |
+
**To verify training results after running:**
|
| 162 |
+
```bash
|
| 163 |
+
# Check CSV files exist and contain data
|
| 164 |
+
head ./logs/cascading_failure_results.csv
|
| 165 |
+
|
| 166 |
+
# Plot results yourself:
|
| 167 |
+
python -c "import pandas as pd; pd.read_csv('./logs/cascading_failure_results.csv').plot()"
|
| 168 |
+
```
|
| 169 |
+
|
| 170 |
### Results
|
| 171 |
|
| 172 |
| Task | First 10 Episodes | Last 10 Episodes | Improvement | Status |
|
|
|
|
| 316 |
- [x] `/grader` endpoint
|
| 317 |
- [x] HF Space deployed and healthy
|
| 318 |
- [x] Baseline inference script
|
| 319 |
+
- [x] Experimental tracking (CSV + checkpoints)
|
| 320 |
+
|
| 321 |
+
## Verifying Training Execution
|
| 322 |
+
|
| 323 |
+
**For judges to verify training actually happened:**
|
| 324 |
+
|
| 325 |
+
```bash
|
| 326 |
+
# 1. Check CSV log files exist
|
| 327 |
+
ls -lh ./logs/
|
| 328 |
+
|
| 329 |
+
# 2. View a sample of episode results
|
| 330 |
+
head -20 ./logs/cascading_failure_results.csv
|
| 331 |
+
|
| 332 |
+
# 3. Check checkpoint files exist
|
| 333 |
+
ls -lh ./phase2_checkpoints/
|
| 334 |
+
|
| 335 |
+
# 4. Plot training curves from CSV
|
| 336 |
+
python -c "
|
| 337 |
+
import pandas as pd
|
| 338 |
+
import matplotlib.pyplot as plt
|
| 339 |
+
|
| 340 |
+
df = pd.read_csv('./logs/cascading_failure_results.csv')
|
| 341 |
+
plt.figure(figsize=(10, 6))
|
| 342 |
+
plt.plot(df['episode'], df['reward'].astype(float))
|
| 343 |
+
plt.xlabel('Episode')
|
| 344 |
+
plt.ylabel('Reward')
|
| 345 |
+
plt.title('Cascading Failure Task - GRPO Training')
|
| 346 |
+
plt.savefig('verification_curve.png')
|
| 347 |
+
print('✓ Verification curve saved')
|
| 348 |
+
"
|
| 349 |
+
```
|
| 350 |
|
| 351 |
---
|
| 352 |
|