Spaces:
Sleeping
Sleeping
| # ClaimSense β Product Vision | |
| > The hackathon submission ships an *RL gym*. This document describes | |
| > the product the gym is the training ground for: a closed-loop claims | |
| > intelligence platform that wires Plaid-style financial signals into | |
| > an LLM adjudicator and uses Scaler AI Labs' RLHF tooling to keep the | |
| > model honest week over week. | |
| ## Why this product exists | |
| Insurers run claims through human adjusters because the workflow is | |
| unforgiving: the wrong call costs real money, regulators audit the | |
| reasoning, and fraudsters keep finding new angles. Naive LLM | |
| deployments fail on this surface for three reasons: | |
| 1. **No investigation reflex.** They take the claim at face value | |
| instead of pulling the policy, history, and supporting transactions. | |
| 2. **No grounding.** They hallucinate dollar amounts because nothing in | |
| the prompt forces them to compare the claim against bank data. | |
| 3. **No correction loop.** A wrong call yesterday can be wrong again | |
| tomorrow because nothing trains on the adjuster override. | |
| ClaimSense solves all three. | |
| ## Platform shape | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β ClaimSense AI Platform β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β β | |
| β Customer journey β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β βββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β | |
| β β Portal ββββΆβ Plaid Link ββββΆβ Identity / Income gate β β | |
| β βββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β | |
| β β β β | |
| β βΌ βΌ β | |
| β Adjudication core β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β Plaid enrichment β transactions, identity, income, assets β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β | |
| β β ClaimSense gym (this repo) β RL training surface β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β | |
| β β Adjudicator LLM β fraud signals + coverage + settlement β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β β | |
| β βΌ β | |
| β Improvement loop β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β Scaler labelling β reward model β GRPO fine-tune (weekly) β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ## Plaid touch-points | |
| The hackathon repo simulates the bank-feed interaction. In production, | |
| five Plaid product calls move the needle: | |
| ### Transactions API β `/transactions/sync` | |
| The single most powerful signal. Cross-references the claim amount | |
| against actual purchases. | |
| ```python | |
| sync = plaid_client.transactions_sync(access_token) | |
| matches = [ | |
| tx for tx in sync.added | |
| if amount_matches(tx, claim.amount, claim.date, claim.merchant) | |
| ] | |
| if matches and abs(matches[0].amount - claim.amount) > tolerance: | |
| flag("inflated_claim", actual=matches[0].amount, claimed=claim.amount) | |
| ``` | |
| **Where it pays off:** auto theft, contents claims, repair invoices. | |
| Catches the *amount* fraud that statistical scores miss. | |
| ### Identity API β `/identity/get` | |
| Verifies the claimant against bank-of-record data. | |
| ```python | |
| identity = plaid_client.identity_get(access_token) | |
| owner = identity.accounts[0].owners[0] | |
| verified = ( | |
| name_match(claim.name, owner.names) | |
| and address_match(claim.address, owner.addresses) | |
| and any(claim.phone == p.data for p in owner.phone_numbers) | |
| ) | |
| ``` | |
| **Where it pays off:** identity-takeover fraud, claim-stuffing schemes. | |
| ### Income & Employment β `/credit/employment/get` | |
| For disability and life claims, anchors the benefit calculation. | |
| ```python | |
| record = plaid_client.credit_employment_get(access_token).items[0] | |
| benefit = compute_disability_benefit( | |
| annual_income=record.pay.annual, | |
| pay_frequency=record.pay.pay_frequency, | |
| employment_status=record.status, | |
| policy=policy, | |
| ) | |
| ``` | |
| ### Asset Report β `/asset_report/get` | |
| Provides a financial-context check: large claims relative to net worth | |
| signal elevated risk. | |
| ```python | |
| report = plaid_client.asset_report_get(asset_report_token) | |
| total_assets = sum( | |
| account.balances.current | |
| for item in report.report.items | |
| for account in item.accounts | |
| ) | |
| if claim.amount > 0.5 * total_assets: | |
| flag("claim_to_assets_ratio_high", ratio=claim.amount / total_assets) | |
| ``` | |
| ### Recurring transactions β `/transactions/recurring/get` | |
| Confirms premium payments are flowing β i.e. the policy is genuinely | |
| active despite what the policy admin system says. | |
| ```python | |
| recurring = plaid_client.transactions_recurring_get(access_token) | |
| premium_streams = [ | |
| s for s in recurring.outflow_streams | |
| if "insurance" in (s.description or "").lower() | |
| or s.merchant_name in INSURANCE_MERCHANTS | |
| ] | |
| ``` | |
| ## Scaler AI Labs Β· RLHF loop | |
| The platform's improvement engine. Three pieces: | |
| ### 1. Labelling pipeline | |
| Every adjudicator decision becomes a Scaler task pre-loaded with the | |
| LLM's reasoning, the claim, and the Plaid evidence. Adjusters mark | |
| *correct / incorrect / partially correct* and add free-text rationale. | |
| ```python | |
| scale_client.create_task( | |
| project="claimsense_review", | |
| task_type="comparison", | |
| data={ | |
| "claim_id": claim.id, | |
| "ai_decision": output.decision, | |
| "ai_reasoning": output.reasoning, | |
| "ai_payout": output.payout, | |
| "claim_details": claim.dict(), | |
| "plaid_evidence": evidence.dict(), | |
| }, | |
| instruction=( | |
| "Was the verdict correct? Was the payout right? Was fraud " | |
| "handled appropriately? Provide reasoning." | |
| ), | |
| ) | |
| ``` | |
| ### 2. Weekly cycle | |
| ``` | |
| Day 1-3 : collect labelled decisions | |
| Day 4-5 : fit / refresh the reward model | |
| Day 6 : GRPO fine-tune on the new reward | |
| Day 7 : shadow-deploy and compare against the live model | |
| (promote if correctness improves and fraud capture stays β₯ live) | |
| ``` | |
| ### 3. Quality dashboard | |
| Tracked across iterations: | |
| ```python | |
| metrics = { | |
| "verdict_correctness": {"baseline": 0.72, "v1": 0.81, "v2": 0.87, "v3": 0.91}, | |
| "fraud_capture": {"baseline": 0.65, "v1": 0.78, "v2": 0.85, "v3": 0.92}, | |
| "median_minutes": {"baseline": 45, "v1": 12, "v2": 8, "v3": 5}, | |
| "savings_per_claim_usd": {"baseline": 0, "v1": 45, "v2": 72, "v3": 95}, | |
| } | |
| ``` | |
| ## Worked example β auto theft | |
| ``` | |
| Step 1 Claim submitted | |
| Claimant reports vehicle stolen. Claims $35,000. | |
| Step 2 Plaid Link | |
| Bank account linked. Identity verified. | |
| Step 3 Plaid Transactions sync | |
| Vehicle purchase located: $22,000, City Auto Sales, 2024-01-15. | |
| Discrepancy detected: claimed $35K, paid $22K. | |
| Step 4 Plaid Asset Report | |
| Total assets $45,000. Claim is 78 % of net worth β flag raised. | |
| Step 5 Adjudicator LLM | |
| risk_score = 0.85 | |
| flags = ["amount_discrepancy", "claim_to_assets_ratio_high"] | |
| verdict = deny | |
| reason = "Inflated claim β bank-feed shows $22K transaction" | |
| Step 6 Scaler review | |
| Adjuster confirms verdict. Free-text: | |
| "Solid catch β discrepancy alone is decisive." | |
| Step 7 Weekly fine-tune | |
| Reward model up-weights "transaction discrepancy β deny" path. | |
| ``` | |
| ## Business case | |
| Reference customer: a regional insurer running ~100,000 personal-line | |
| claims a year, average ticket $5,000, fraud rate 5%. | |
| | | Today | With ClaimSense | | |
| |---|---:|---:| | |
| | Median cycle time | 14 days | 2 hours | | |
| | Fraud capture | 23 % | 91 % | | |
| | False positives | 12 % | 3 % | | |
| | Cost per claim | $150 | $35 | | |
| | CSAT | 3.2 / 5 | 4.6 / 5 | | |
| ``` | |
| Fraud loss before: 3,850 missed Γ $5,000 = $19.25 M | |
| Fraud loss after: 450 missed Γ $5,000 = $2.25 M | |
| Reduction in fraud loss .................. = $17.00 M | |
| Processing cost before: 100,000 Γ $150 = $15.00 M | |
| Processing cost after : 100,000 Γ $35 = $3.50 M | |
| Reduction in processing cost ............. = $11.50 M | |
| Total annual savings ..................... = $28.50 M | |
| ``` | |
| ## Roadmap | |
| ### Phase 1 β Foundations Β· months 1-2 | |
| - Plaid Transactions + Identity in production | |
| - Reward model v0 from supervised labels | |
| - FastAPI scoring endpoint | |
| - Scaler project bootstrap | |
| ### Phase 2 β RLHF online Β· months 3-4 | |
| - Expert labelling UI | |
| - GRPO/PPO weekly fine-tunes | |
| - Shadow-deploy + A/B harness | |
| ### Phase 3 β Coverage expansion Β· months 5-6 | |
| - Income + Asset Plaid products | |
| - Adjuster cockpit (read-only first) | |
| - Real-time fraud-scoring API | |
| ### Phase 4 β Commercial scale Β· months 7-12 | |
| - Multi-tenant SaaS | |
| - White-label option | |
| - SOC2 / HIPAA / NAIC compliance work | |
| ## Technical stack snapshot | |
| ```yaml | |
| runtime: | |
| language: Python 3.11+ | |
| web: FastAPI | |
| workers: Celery on Redis | |
| rl: OpenEnv (this gym), TRL/Unsloth for fine-tuning | |
| data: PostgreSQL, S3 for evidence | |
| integrations: | |
| plaid: Transactions, Identity, Income, Assets, Recurring | |
| scaler: RLHF labelling + reward modelling | |
| cloud: AWS / GCP | |
| deployment: | |
| preview: Hugging Face Spaces (this Space) | |
| production: Docker / Kubernetes (single-tenant first) | |
| ``` | |
| ## Coordinates | |
| | Resource | Where | | |
| |---|---| | |
| | Live Space | <https://huggingface.co/spaces/akhiilll/claims-env> | | |
| | Repo | (this directory) | | |
| | Statement | OpenEnv Hackathon Β· 3.1 β Professional Tasks | | |
| | Sub-theme | Scaler AI Labs β Enterprise Workflows | | |