Spaces:

vn6295337
/

Instant-SWOT-Agent

Sleeping

vn6295337 Claude Opus 4.5 commited on Jan 12

Commit

a2c9702

1 Parent(s): 484a7a7

Layer 4: Add deterministic numeric validation in Critic

Enforces machine-verifiable numeric accuracy:

1. Prompt changes (analyzer.py):
- Require [M##] citations for all metric values
- Example: "Revenue of $394.3B [M01] demonstrates..."
- Clear warning that citations are auto-verified

2. New validator (numeric_validator.py):
- Extract [M##] citations from SWOT output
- Normalize values ($394.3B -> 394300000000)
- Compare against metric_reference with tolerance
- Return specific mismatch descriptions

3. Critic integration (critic.py):
- Validate citations after LLM evaluation
- If mismatches: cap evidence_grounding at 4, force rejection
- Add specific feedback for revision
- Log validation results to activity log

Tested with Ford hallucination case:
- Detects: market_cap $43.4B vs expected $56.6B
- Detects: pe_trailing 21.3 vs expected 12.14

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (3) hide show

src/nodes/analyzer.py +11 -5
src/nodes/critic.py +53 -0
src/utils/numeric_validator.py +226 -0

src/nodes/analyzer.py CHANGED Viewed

@@ -1058,6 +1058,7 @@ Weighted Score: {critique_details.get('weighted_score', 0):.1f} / 10
 - Apply each point in "Actionable Feedback" — these are specific instructions
 - Keep everything listed under "Strengths to Preserve" — do not modify these sections
 - **Use EXACT metric values from the METRIC REFERENCE TABLE** — copy numbers verbatim
 - Include the 'as of' date when citing temporal metrics
 {ev_note}
@@ -1065,6 +1066,7 @@ Weighted Score: {critique_details.get('weighted_score', 0):.1f} / 10
 - Ignore lower-priority feedback items — address all of them
 - Introduce new metrics not in the original input data
 - **Round, estimate, or approximate any numbers** — use exact values only
 - Remove content that was working well
 - Add defensive caveats or apologies about the revision
 - Reference the revision process in your output — produce a clean SWOT as if first attempt
@@ -1132,26 +1134,26 @@ Produce a SWOT analysis with this exact structure:
 ## Strengths
 For each (3-5 points):
-- **Finding:** [One sentence with specific metric from the METRIC REFERENCE TABLE]
 - **Strategic Implication:** [Why this matters]
 - **Durability:** [High/Medium/Low]
 ## Weaknesses
 For each (3-5 points):
-- **Finding:** [One sentence with specific metric from the METRIC REFERENCE TABLE]
 - **Severity:** [Critical/Moderate/Minor]
 - **Trend:** [Improving/Stable/Deteriorating]
 - **Remediation Levers:** [What could improve this]
 ## Opportunities
 For each (3-5 points):
-- **Catalyst:** [Description with supporting data]
 - **Timing:** [Near-term/Medium-term/Long-term]
 - **Execution Requirements:** [What must happen]
 ## Threats
 For each (3-5 points):
-- **Risk Factor:** [Description with supporting data]
 - **Probability:** [High/Medium/Low]
 - **Impact:** [Potential magnitude]
 - **Mitigation Options:** [Possible responses]
@@ -1161,7 +1163,11 @@ For each (3-5 points):
 - **Data Gaps:** [Any unavailable metrics]
 - **Confidence Level:** [High/Medium/Low]
-CRITICAL: Every numeric finding MUST use the EXACT value from the METRIC REFERENCE TABLE above. Do NOT round or estimate."""
     return prompt, metric_lookup, ref_hash

 - Apply each point in "Actionable Feedback" — these are specific instructions
 - Keep everything listed under "Strengths to Preserve" — do not modify these sections
 - **Use EXACT metric values from the METRIC REFERENCE TABLE** — copy numbers verbatim
+- **Include [M##] citation after every metric value** — e.g., "$394.3B [M01]"
 - Include the 'as of' date when citing temporal metrics
 {ev_note}
 - Ignore lower-priority feedback items — address all of them
 - Introduce new metrics not in the original input data
 - **Round, estimate, or approximate any numbers** — use exact values only
+- **Omit [M##] citations** — they are required for automatic verification
 - Remove content that was working well
 - Add defensive caveats or apologies about the revision
 - Reference the revision process in your output — produce a clean SWOT as if first attempt
 ## Strengths
 For each (3-5 points):
+- **Finding:** [One sentence with metric value and citation, e.g., "Revenue of $394.3B [M01] shows..."]
 - **Strategic Implication:** [Why this matters]
 - **Durability:** [High/Medium/Low]
 ## Weaknesses
 For each (3-5 points):
+- **Finding:** [One sentence with metric value and citation, e.g., "Debt/equity of 1.87 [M04] indicates..."]
 - **Severity:** [Critical/Moderate/Minor]
 - **Trend:** [Improving/Stable/Deteriorating]
 - **Remediation Levers:** [What could improve this]
 ## Opportunities
 For each (3-5 points):
+- **Catalyst:** [Description with metric citations where applicable]
 - **Timing:** [Near-term/Medium-term/Long-term]
 - **Execution Requirements:** [What must happen]
 ## Threats
 For each (3-5 points):
+- **Risk Factor:** [Description with metric citations where applicable]
 - **Probability:** [High/Medium/Low]
 - **Impact:** [Potential magnitude]
 - **Mitigation Options:** [Possible responses]
 - **Data Gaps:** [Any unavailable metrics]
 - **Confidence Level:** [High/Medium/Low]
+CRITICAL CITATION REQUIREMENTS:
+1. Every numeric finding MUST include the reference ID in brackets: value [M##]
+2. Use EXACT values from the METRIC REFERENCE TABLE - do NOT round or estimate
+3. Example: "Revenue of $394,328,000,000 [M01] demonstrates strong market position"
+4. Citations will be automatically verified - mismatches cause rejection"""
     return prompt, metric_lookup, ref_hash

src/nodes/critic.py CHANGED Viewed

@@ -3,6 +3,10 @@ from langsmith import traceable
 import json
 import time
 def _add_activity_log(workflow_id, progress_store, step, message):
     """Helper to add activity log entry."""
@@ -353,6 +357,55 @@ def critic_node(state, workflow_id=None, progress_store=None):
     weighted_score = result["weighted_score"]
     scores = result["scores"]
     # Handle ESCALATE if max iterations reached
     if iteration > 3 and status == "REJECTED":
         status = "ESCALATE"

 import json
 import time
+# Layer 4: Deterministic numeric validation
+from src.utils.numeric_validator import validate_numeric_accuracy
+from src.nodes.analyzer import _verify_reference_integrity
 def _add_activity_log(workflow_id, progress_store, step, message):
     """Helper to add activity log entry."""
     weighted_score = result["weighted_score"]
     scores = result["scores"]
+    # ============================================================
+    # LAYER 4: Deterministic Numeric Validation
+    # ============================================================
+    metric_ref = state.get("metric_reference", {})
+    ref_hash = state.get("metric_reference_hash", "")
+    if metric_ref and ref_hash:
+        # Verify integrity before using
+        if _verify_reference_integrity(metric_ref, ref_hash):
+            mismatches = validate_numeric_accuracy(report, metric_ref)
+            if mismatches:
+                _add_activity_log(workflow_id, progress_store, "critic",
+                                  f"Numeric validation: {len(mismatches)} mismatch(es) detected")
+                # Ensure hallucinations_detected exists
+                if "hallucinations_detected" not in result:
+                    result["hallucinations_detected"] = []
+                result["hallucinations_detected"].extend(mismatches)
+                # Cap evidence_grounding score
+                if scores.get("evidence_grounding", 0) > 4:
+                    scores["evidence_grounding"] = 4
+                    if "hard_floor_violations" not in result:
+                        result["hard_floor_violations"] = []
+                    result["hard_floor_violations"].append(
+                        "Numeric mismatch detected - evidence_grounding capped at 4"
+                    )
+                # Add specific feedback
+                if "actionable_feedback" not in result:
+                    result["actionable_feedback"] = []
+                result["actionable_feedback"].insert(0,
+                    f"Fix {len(mismatches)} numeric mismatch(es) - use exact values with [M##] citations from reference table"
+                )
+                # Recalculate weighted score with capped evidence_grounding
+                weighted_score = calculate_weighted_score(scores)
+                result["weighted_score"] = weighted_score
+                # Force rejection if numeric mismatches
+                status = "REJECTED"
+                result["status"] = status
+            else:
+                _add_activity_log(workflow_id, progress_store, "critic",
+                                  "Numeric validation: all citations verified")
+        else:
+            _add_activity_log(workflow_id, progress_store, "critic",
+                              "Warning: metric reference integrity check failed - skipping numeric validation")
     # Handle ESCALATE if max iterations reached
     if iteration > 3 and status == "REJECTED":
         status = "ESCALATE"

src/utils/numeric_validator.py ADDED Viewed

	@@ -0,0 +1,226 @@

+"""
+Deterministic numeric validation for SWOT analysis outputs.
+Layer 4: Validates that cited metric values match the reference table.
+Extracts [M##] citations from SWOT text and verifies against metric_reference dict.
+"""
+import re
+from typing import Optional
+# Pattern to match citations like: $394.3B [M01], 25.3% [M02], 32.5 [M04]
+CITATION_PATTERN = re.compile(
+    r'([\d,$\.]+[BMK%]?)\s*\[M(\d{2})\]',
+    re.IGNORECASE
+)
+def normalize_value(text: str) -> Optional[float]:
+    """
+    Normalize a value string to a float for comparison.
+    Handles:
+    - Currency: $394.3B -> 394300000000, $56.6M -> 56600000
+    - Percentages: 25.3% -> 25.3
+    - Plain numbers: 32.5 -> 32.5, 1,234 -> 1234
+    Returns None if parsing fails.
+    """
+    if not text:
+        return None
+    # Remove whitespace and common formatting
+    text = text.strip().replace(',', '').replace(' ', '')
+    # Handle currency with B/M/K suffix
+    if text.startswith('$'):
+        text = text[1:]  # Remove $
+        multiplier = 1
+        if text.upper().endswith('B'):
+            multiplier = 1e9
+            text = text[:-1]
+        elif text.upper().endswith('M'):
+            multiplier = 1e6
+            text = text[:-1]
+        elif text.upper().endswith('K'):
+            multiplier = 1e3
+            text = text[:-1]
+        try:
+            return float(text) * multiplier
+        except ValueError:
+            return None
+    # Handle percentages
+    if text.endswith('%'):
+        try:
+            return float(text[:-1])
+        except ValueError:
+            return None
+    # Plain number
+    try:
+        return float(text)
+    except ValueError:
+        return None
+def values_match(found_value: float, expected_value: float, value_type: str = "unknown") -> bool:
+    """
+    Check if two values match within acceptable tolerance.
+    Tolerances:
+    - Currency (large numbers): ±1% relative
+    - Percentages: ±0.1 absolute
+    - Small decimals (ratios, etc.): ±0.05 absolute
+    """
+    if found_value is None or expected_value is None:
+        return False
+    # Large numbers (currency) - use relative tolerance
+    if abs(expected_value) >= 1e6:
+        tolerance = abs(expected_value) * 0.01  # 1%
+        return abs(found_value - expected_value) <= tolerance
+    # Small numbers - use absolute tolerance
+    # Percentages and ratios
+    if abs(expected_value) < 100:
+        tolerance = 0.15  # Allow slight rounding differences
+        return abs(found_value - expected_value) <= tolerance
+    # Medium numbers
+    tolerance = abs(expected_value) * 0.01
+    return abs(found_value - expected_value) <= tolerance
+def extract_citations(text: str) -> list[dict]:
+    """
+    Extract all [M##] citations from text.
+    Returns list of dicts:
+    [
+        {"ref_id": "M01", "cited_value": "$394.3B", "normalized": 394300000000.0},
+        {"ref_id": "M02", "cited_value": "25.3%", "normalized": 25.3},
+    ]
+    """
+    citations = []
+    for match in CITATION_PATTERN.finditer(text):
+        cited_value = match.group(1)
+        ref_num = match.group(2)
+        ref_id = f"M{ref_num}"
+        normalized = normalize_value(cited_value)
+        citations.append({
+            "ref_id": ref_id,
+            "cited_value": cited_value,
+            "normalized": normalized
+        })
+    return citations
+def validate_citations(swot_text: str, metric_reference: dict) -> dict:
+    """
+    Validate all citations in SWOT text against metric_reference.
+    Args:
+        swot_text: The SWOT analysis output
+        metric_reference: Dict from Layer 1 with format:
+            {"M01": {"key": "revenue", "raw_value": 394328000000, "formatted": "..."}, ...}
+    Returns:
+        {
+            "valid": bool,
+            "citations_found": int,
+            "mismatches": [
+                "revenue [M01]: cited $56.6B, expected $394.3B",
+                ...
+            ],
+            "missing_refs": ["M99"],  # Citations to non-existent refs
+            "details": [...]  # Full details for each citation
+        }
+    """
+    citations = extract_citations(swot_text)
+    result = {
+        "valid": True,
+        "citations_found": len(citations),
+        "mismatches": [],
+        "missing_refs": [],
+        "details": []
+    }
+    for citation in citations:
+        ref_id = citation["ref_id"]
+        cited_value = citation["cited_value"]
+        cited_normalized = citation["normalized"]
+        detail = {
+            "ref_id": ref_id,
+            "cited_value": cited_value,
+            "cited_normalized": cited_normalized,
+            "status": "unknown"
+        }
+        # Check if reference exists
+        if ref_id not in metric_reference:
+            result["missing_refs"].append(ref_id)
+            result["valid"] = False
+            detail["status"] = "missing_ref"
+            detail["error"] = f"Reference {ref_id} not found in metric table"
+            result["details"].append(detail)
+            continue
+        ref_entry = metric_reference[ref_id]
+        expected_value = ref_entry.get("raw_value")
+        metric_key = ref_entry.get("key", "unknown")
+        expected_formatted = ref_entry.get("formatted", str(expected_value))
+        detail["metric_key"] = metric_key
+        detail["expected_value"] = expected_value
+        detail["expected_formatted"] = expected_formatted
+        # Check if values match
+        if cited_normalized is None:
+            result["mismatches"].append(
+                f"{metric_key} [{ref_id}]: could not parse cited value '{cited_value}'"
+            )
+            result["valid"] = False
+            detail["status"] = "parse_error"
+        elif not values_match(cited_normalized, expected_value):
+            # Format expected value for display
+            if abs(expected_value) >= 1e9:
+                expected_display = f"${expected_value/1e9:.1f}B"
+            elif abs(expected_value) >= 1e6:
+                expected_display = f"${expected_value/1e6:.0f}M"
+            else:
+                expected_display = expected_formatted.split(" (as of")[0] if " (as of" in expected_formatted else expected_formatted
+            result["mismatches"].append(
+                f"{metric_key} [{ref_id}]: cited {cited_value}, expected {expected_display}"
+            )
+            result["valid"] = False
+            detail["status"] = "mismatch"
+        else:
+            detail["status"] = "valid"
+        result["details"].append(detail)
+    return result
+def validate_numeric_accuracy(swot_text: str, metric_reference: dict) -> list[str]:
+    """
+    Main validation function for critic integration.
+    Returns list of mismatch descriptions (empty if all valid).
+    """
+    if not metric_reference:
+        return []
+    result = validate_citations(swot_text, metric_reference)
+    # Combine mismatches and missing refs
+    errors = result["mismatches"].copy()
+    for ref_id in result["missing_refs"]:
+        errors.append(f"Invalid reference: {ref_id} not in metric table")
+    return errors