CausalFinBench Results
Date: 2026-05-08 00:12 Model: Horizon v1 (172M DiT, Pearl Level 3) Checkpoint: checkpoints/phase9_production/step_200000.pt Total time: 26.0 minutes
Summary
- Tier A: 5/5 PASS
- Tier B: 2/3 PARTIAL
- Tier C: 1/1 PASS
OVERALL: ISSUES FOUND
Detailed Results
| Test | Tier | Name | Cases | Pass Rate | Result |
|---|---|---|---|---|---|
| A1 | A | Consistency | 100 | 100.0% | PASS |
| A2 | A | Causal Asymmetry | 19 | 100.0% | PASS |
| A3 | A | Compositionality | 30 | 93.3% | PASS |
| A4 | A | Counterfactual Coherence | 50 | 92.0% | PASS |
| A5 | A | Robustness | 56 | 100.0% | PASS |
| B1 | B | Placebo (non-edges) | 39 | 59.0% | FAIL |
| B2 | B | Real effects (edges) | 19 | 100.0% | PASS |
| B4 | B | Sensitivity monotonicity | 10 | 100.0% | PASS |
| C1 | C | RBI Rate Decisions | 42 | 100.0% | PASS |