Rohan03 commited on
Commit
d9f6778
Β·
verified Β·
1 Parent(s): ab5adb4

Track 2: validation suite with improvement curves, cold/warm, transfer, adversarial

Browse files
benchmarks/results/track2_report.txt ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ╔════════════════════════════════════════════════════╗
2
+ β•‘ Purpose Agent β€” Track 2 Validation Report β•‘
3
+ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
4
+
5
+ ═══ Improvement Curves ═══
6
+ Task Run Steps Ξ¦ Pass% Heur
7
+ ────────────────────────────────────────────────
8
+ fibonacci 1 2 5.0 50% 3
9
+ fibonacci 2 1 10.0 0% 9
10
+ fibonacci 3 1 10.0 0% 18
11
+ fibonacci 4 1 10.0 0% 30
12
+ fibonacci 5 1 10.0 0% 45
13
+ β†’ Ξ”(Ξ¦) = +5.0 βœ“ IMPROVED
14
+
15
+ factorial 1 2 1.0 0% 3
16
+ factorial 2 1 10.0 0% 9
17
+ factorial 3 1 10.0 0% 18
18
+ factorial 4 1 10.0 0% 30
19
+ factorial 5 1 10.0 0% 45
20
+ β†’ Ξ”(Ξ¦) = +9.0 βœ“ IMPROVED
21
+
22
+ palindrome 1 2 7.0 75% 3
23
+ palindrome 2 1 10.0 0% 9
24
+ palindrome 3 1 10.0 0% 18
25
+ palindrome 4 1 10.0 0% 30
26
+ palindrome 5 1 10.0 0% 45
27
+ β†’ Ξ”(Ξ¦) = +3.0 βœ“ IMPROVED
28
+
29
+ fizzbuzz 1 2 7.0 75% 3
30
+ fizzbuzz 2 1 10.0 0% 9
31
+ fizzbuzz 3 1 10.0 0% 18
32
+ fizzbuzz 4 1 10.0 0% 30
33
+ fizzbuzz 5 1 10.0 0% 45
34
+ β†’ Ξ”(Ξ¦) = +3.0 βœ“ IMPROVED
35
+
36
+ ═══ Cold vs Warm ═══
37
+ fibonacci cold=5.0 warm=10.0 Ξ”=+5.0 βœ“
38
+ factorial cold=1.0 warm=10.0 Ξ”=+9.0 βœ“
39
+
40
+ ═══ Cross-Task Transfer (['fibonacci', 'factorial'] β†’ ['palindrome', 'fizzbuzz']) ═══
41
+ 30 heuristics transferred
42
+ palindrome: βœ— Ξ¦=10.0
43
+ fizzbuzz: βœ— Ξ¦=10.0
44
+
45
+ ═══ Adversarial Robustness: 100% (8/8) ═══
46
+
47
+ ═══ VERDICT ═══
48
+ βœ“ Self-improvement: Ξ¦ increases across runs
49
+ βœ“ Cold/warm: memory helps (positive delta)
50
+ βœ“ Immune system: 100% adversarial accuracy