# ClaimSense — Product Vision > The hackathon submission ships an *RL gym*. This document describes > the product the gym is the training ground for: a closed-loop claims > intelligence platform that wires Plaid-style financial signals into > an LLM adjudicator and uses Scaler AI Labs' RLHF tooling to keep the > model honest week over week. ## Why this product exists Insurers run claims through human adjusters because the workflow is unforgiving: the wrong call costs real money, regulators audit the reasoning, and fraudsters keep finding new angles. Naive LLM deployments fail on this surface for three reasons: 1. **No investigation reflex.** They take the claim at face value instead of pulling the policy, history, and supporting transactions. 2. **No grounding.** They hallucinate dollar amounts because nothing in the prompt forces them to compare the claim against bank data. 3. **No correction loop.** A wrong call yesterday can be wrong again tomorrow because nothing trains on the adjuster override. ClaimSense solves all three. ## Platform shape ``` ┌──────────────────────────────────────────────────────────────────────┐ │ ClaimSense AI Platform │ ├──────────────────────────────────────────────────────────────────────┤ │ │ │ Customer journey │ │ ────────────────────────────────────────────────────────── │ │ ┌─────────┐ ┌──────────────┐ ┌──────────────────────────┐ │ │ │ Portal │──▶│ Plaid Link │──▶│ Identity / Income gate │ │ │ └─────────┘ └──────────────┘ └──────────────────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ Adjudication core │ │ ────────────────────────────────────────────────────────── │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ Plaid enrichment — transactions, identity, income, assets │ │ │ ├────────────────────────────────────────────────────────────┤ │ │ │ ClaimSense gym (this repo) — RL training surface │ │ │ ├────────────────────────────────────────────────────────────┤ │ │ │ Adjudicator LLM — fraud signals + coverage + settlement │ │ │ └────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ Improvement loop │ │ ────────────────────────────────────────────────────────── │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ Scaler labelling → reward model → GRPO fine-tune (weekly) │ │ │ └────────────────────────────────────────────────────────────┘ │ └──────────────────────────────────────────────────────────────────────┘ ``` ## Plaid touch-points The hackathon repo simulates the bank-feed interaction. In production, five Plaid product calls move the needle: ### Transactions API — `/transactions/sync` The single most powerful signal. Cross-references the claim amount against actual purchases. ```python sync = plaid_client.transactions_sync(access_token) matches = [ tx for tx in sync.added if amount_matches(tx, claim.amount, claim.date, claim.merchant) ] if matches and abs(matches[0].amount - claim.amount) > tolerance: flag("inflated_claim", actual=matches[0].amount, claimed=claim.amount) ``` **Where it pays off:** auto theft, contents claims, repair invoices. Catches the *amount* fraud that statistical scores miss. ### Identity API — `/identity/get` Verifies the claimant against bank-of-record data. ```python identity = plaid_client.identity_get(access_token) owner = identity.accounts[0].owners[0] verified = ( name_match(claim.name, owner.names) and address_match(claim.address, owner.addresses) and any(claim.phone == p.data for p in owner.phone_numbers) ) ``` **Where it pays off:** identity-takeover fraud, claim-stuffing schemes. ### Income & Employment — `/credit/employment/get` For disability and life claims, anchors the benefit calculation. ```python record = plaid_client.credit_employment_get(access_token).items[0] benefit = compute_disability_benefit( annual_income=record.pay.annual, pay_frequency=record.pay.pay_frequency, employment_status=record.status, policy=policy, ) ``` ### Asset Report — `/asset_report/get` Provides a financial-context check: large claims relative to net worth signal elevated risk. ```python report = plaid_client.asset_report_get(asset_report_token) total_assets = sum( account.balances.current for item in report.report.items for account in item.accounts ) if claim.amount > 0.5 * total_assets: flag("claim_to_assets_ratio_high", ratio=claim.amount / total_assets) ``` ### Recurring transactions — `/transactions/recurring/get` Confirms premium payments are flowing — i.e. the policy is genuinely active despite what the policy admin system says. ```python recurring = plaid_client.transactions_recurring_get(access_token) premium_streams = [ s for s in recurring.outflow_streams if "insurance" in (s.description or "").lower() or s.merchant_name in INSURANCE_MERCHANTS ] ``` ## Scaler AI Labs · RLHF loop The platform's improvement engine. Three pieces: ### 1. Labelling pipeline Every adjudicator decision becomes a Scaler task pre-loaded with the LLM's reasoning, the claim, and the Plaid evidence. Adjusters mark *correct / incorrect / partially correct* and add free-text rationale. ```python scale_client.create_task( project="claimsense_review", task_type="comparison", data={ "claim_id": claim.id, "ai_decision": output.decision, "ai_reasoning": output.reasoning, "ai_payout": output.payout, "claim_details": claim.dict(), "plaid_evidence": evidence.dict(), }, instruction=( "Was the verdict correct? Was the payout right? Was fraud " "handled appropriately? Provide reasoning." ), ) ``` ### 2. Weekly cycle ``` Day 1-3 : collect labelled decisions Day 4-5 : fit / refresh the reward model Day 6 : GRPO fine-tune on the new reward Day 7 : shadow-deploy and compare against the live model (promote if correctness improves and fraud capture stays ≥ live) ``` ### 3. Quality dashboard Tracked across iterations: ```python metrics = { "verdict_correctness": {"baseline": 0.72, "v1": 0.81, "v2": 0.87, "v3": 0.91}, "fraud_capture": {"baseline": 0.65, "v1": 0.78, "v2": 0.85, "v3": 0.92}, "median_minutes": {"baseline": 45, "v1": 12, "v2": 8, "v3": 5}, "savings_per_claim_usd": {"baseline": 0, "v1": 45, "v2": 72, "v3": 95}, } ``` ## Worked example — auto theft ``` Step 1 Claim submitted Claimant reports vehicle stolen. Claims $35,000. Step 2 Plaid Link Bank account linked. Identity verified. Step 3 Plaid Transactions sync Vehicle purchase located: $22,000, City Auto Sales, 2024-01-15. Discrepancy detected: claimed $35K, paid $22K. Step 4 Plaid Asset Report Total assets $45,000. Claim is 78 % of net worth — flag raised. Step 5 Adjudicator LLM risk_score = 0.85 flags = ["amount_discrepancy", "claim_to_assets_ratio_high"] verdict = deny reason = "Inflated claim — bank-feed shows $22K transaction" Step 6 Scaler review Adjuster confirms verdict. Free-text: "Solid catch — discrepancy alone is decisive." Step 7 Weekly fine-tune Reward model up-weights "transaction discrepancy → deny" path. ``` ## Business case Reference customer: a regional insurer running ~100,000 personal-line claims a year, average ticket $5,000, fraud rate 5%. | | Today | With ClaimSense | |---|---:|---:| | Median cycle time | 14 days | 2 hours | | Fraud capture | 23 % | 91 % | | False positives | 12 % | 3 % | | Cost per claim | $150 | $35 | | CSAT | 3.2 / 5 | 4.6 / 5 | ``` Fraud loss before: 3,850 missed × $5,000 = $19.25 M Fraud loss after: 450 missed × $5,000 = $2.25 M Reduction in fraud loss .................. = $17.00 M Processing cost before: 100,000 × $150 = $15.00 M Processing cost after : 100,000 × $35 = $3.50 M Reduction in processing cost ............. = $11.50 M Total annual savings ..................... = $28.50 M ``` ## Roadmap ### Phase 1 — Foundations · months 1-2 - Plaid Transactions + Identity in production - Reward model v0 from supervised labels - FastAPI scoring endpoint - Scaler project bootstrap ### Phase 2 — RLHF online · months 3-4 - Expert labelling UI - GRPO/PPO weekly fine-tunes - Shadow-deploy + A/B harness ### Phase 3 — Coverage expansion · months 5-6 - Income + Asset Plaid products - Adjuster cockpit (read-only first) - Real-time fraud-scoring API ### Phase 4 — Commercial scale · months 7-12 - Multi-tenant SaaS - White-label option - SOC2 / HIPAA / NAIC compliance work ## Technical stack snapshot ```yaml runtime: language: Python 3.11+ web: FastAPI workers: Celery on Redis rl: OpenEnv (this gym), TRL/Unsloth for fine-tuning data: PostgreSQL, S3 for evidence integrations: plaid: Transactions, Identity, Income, Assets, Recurring scaler: RLHF labelling + reward modelling cloud: AWS / GCP deployment: preview: Hugging Face Spaces (this Space) production: Docker / Kubernetes (single-tenant first) ``` ## Coordinates | Resource | Where | |---|---| | Live Space | | | Repo | (this directory) | | Statement | OpenEnv Hackathon · 3.1 — Professional Tasks | | Sub-theme | Scaler AI Labs — Enterprise Workflows |