QuantHive-Research-Tech commited on
Commit
7ffc277
·
verified ·
1 Parent(s): d74c433

Upload results/benchmark_report.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. results/benchmark_report.md +29 -0
results/benchmark_report.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CausalFinBench Results
2
+
3
+ Date: 2026-05-08 00:12
4
+ Model: Horizon v1 (172M DiT, Pearl Level 3)
5
+ Checkpoint: checkpoints/phase9_production/step_200000.pt
6
+ Total time: 26.0 minutes
7
+
8
+ ## Summary
9
+
10
+ - Tier A: 5/5 PASS
11
+ - Tier B: 2/3 PARTIAL
12
+ - Tier C: 1/1 PASS
13
+
14
+ **OVERALL: ISSUES FOUND**
15
+
16
+
17
+ ## Detailed Results
18
+
19
+ | Test | Tier | Name | Cases | Pass Rate | Result |
20
+ |------|------|------|-------|-----------|--------|
21
+ | A1 | A | Consistency | 100 | 100.0% | PASS |
22
+ | A2 | A | Causal Asymmetry | 19 | 100.0% | PASS |
23
+ | A3 | A | Compositionality | 30 | 93.3% | PASS |
24
+ | A4 | A | Counterfactual Coherence | 50 | 92.0% | PASS |
25
+ | A5 | A | Robustness | 56 | 100.0% | PASS |
26
+ | B1 | B | Placebo (non-edges) | 39 | 59.0% | FAIL |
27
+ | B2 | B | Real effects (edges) | 19 | 100.0% | PASS |
28
+ | B4 | B | Sensitivity monotonicity | 10 | 100.0% | PASS |
29
+ | C1 | C | RBI Rate Decisions | 42 | 100.0% | PASS |