commitment_conservation_harness / REPRODUCIBILITY.md
burnmydays's picture
Archive harness v1, update docs for v0.04 paper and harness v2
a8c7a60

Reproducibility Receipt

Paper: A Conservation Law for Commitment in Language Under Transformative Compression and Recursive Application Version: V.04 — Technical Structure Depth DOI: 10.5281/zenodo.18792459 Concept DOI (all versions): 10.5281/zenodo.18267278

Date: March 6, 2026 Status: Harness v2.0 Confirmed Operational

Test Execution

All 53 tests passing in test suite:

python -m pytest tests/test_harness.py -v

Result: 53 passed in 0.07s

Test Coverage

Extraction (modal-pattern sieve):

  • Sentence segmentation (single, multiple, semicolons, empty)
  • Classification (obligations, prohibitions, constraints)
  • False positive rejection ("will", "have", soft modals)
  • "must not" correctly classified as prohibition (v1 regression)
  • Conditional detection
  • Backward compatibility interface

Fidelity (min-aggregated scoring):

  • Jaccard (perfect, zero, partial overlap, empty sets)
  • Cosine TF-IDF (identical, paraphrased, unrelated)
  • NLI proxy (modal preserved vs. destroyed)
  • Min-aggregation binding

Compression:

  • Extractive backend (compression, modal priority, passthrough)

Enforcement:

  • Commitment gate (pass when preserved)
  • Baseline (no gate)

Lineage:

  • Hash determinism
  • Commitment set hash (order-independent)
  • Chain integrity validation
  • Chain break detection
  • JSON serialization

Corpus:

  • 25 signals load correctly
  • 5 categories present (contractual, technical, regulatory, procedural, composite)
  • All signals contain extractable commitments

Integration:

  • Single signal full protocol
  • Enforcement >= baseline validation

Regressions (v1 bugs):

  • "will" false positive blocked
  • "have" false positive blocked
  • Soft modals rejected
  • "must not" is prohibition
  • Fidelity uses multiple metrics

Environment

  • Python: 3.11+
  • Dependencies: gradio, matplotlib (demo only); core harness is stdlib-only
  • Lossy backend: Pure Python, zero external dependencies, deterministic
  • Matplotlib backend: Agg (non-GUI, CI-friendly)

Running Tests

Quick run:

python -m pytest tests/test_harness.py -q

Verbose output:

python -m pytest tests/test_harness.py -v

Full falsification protocol (CLI):

python -m src.runner --backend lossy --depth 10

Interactive demo:

python app.py
# Opens at http://localhost:7860

Notes

  • Tests complete in <1 second (no model loading required)
  • Lossy backend is deterministic: same input -> same output -> same lineage chain
  • Harness requires only gradio and matplotlib for the demo; core pipeline is stdlib-only
  • Previous harness (v1) archived at archive/harness-v1/

Harness v2.0 is research-ready for experimental evaluation and adversarial replication.