# ClaimSense — Product Vision

> The hackathon submission ships an *RL gym*. This document describes
> the product the gym is the training ground for: a closed-loop claims
> intelligence platform that wires Plaid-style financial signals into
> an LLM adjudicator and uses Scaler AI Labs' RLHF tooling to keep the
> model honest week over week.

## Why this product exists

Insurers run claims through human adjusters because the workflow is
unforgiving: the wrong call costs real money, regulators audit the
reasoning, and fraudsters keep finding new angles. Naive LLM
deployments fail on this surface for three reasons:

1. **No investigation reflex.** They take the claim at face value
   instead of pulling the policy, history, and supporting transactions.
2. **No grounding.** They hallucinate dollar amounts because nothing in
   the prompt forces them to compare the claim against bank data.
3. **No correction loop.** A wrong call yesterday can be wrong again
   tomorrow because nothing trains on the adjuster override.

ClaimSense solves all three.

## Platform shape

```
┌──────────────────────────────────────────────────────────────────────┐
│                      ClaimSense AI Platform                          │
├──────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   Customer journey                                                   │
│   ──────────────────────────────────────────────────────────         │
│   ┌─────────┐   ┌──────────────┐   ┌──────────────────────────┐      │
│   │ Portal  │──▶│  Plaid Link  │──▶│  Identity / Income gate  │      │
│   └─────────┘   └──────────────┘   └──────────────────────────┘      │
│        │                                       │                     │
│        ▼                                       ▼                     │
│   Adjudication core                                                  │
│   ──────────────────────────────────────────────────────────         │
│   ┌────────────────────────────────────────────────────────────┐     │
│   │ Plaid enrichment — transactions, identity, income, assets  │     │
│   ├────────────────────────────────────────────────────────────┤     │
│   │ ClaimSense gym (this repo) — RL training surface           │     │
│   ├────────────────────────────────────────────────────────────┤     │
│   │ Adjudicator LLM — fraud signals + coverage + settlement    │     │
│   └────────────────────────────────────────────────────────────┘     │
│                          │                                           │
│                          ▼                                           │
│   Improvement loop                                                   │
│   ──────────────────────────────────────────────────────────         │
│   ┌────────────────────────────────────────────────────────────┐     │
│   │ Scaler labelling → reward model → GRPO fine-tune (weekly)  │     │
│   └────────────────────────────────────────────────────────────┘     │
└──────────────────────────────────────────────────────────────────────┘
```

## Plaid touch-points

The hackathon repo simulates the bank-feed interaction. In production,
five Plaid product calls move the needle:

### Transactions API — `/transactions/sync`

The single most powerful signal. Cross-references the claim amount
against actual purchases.

```python
sync = plaid_client.transactions_sync(access_token)
matches = [
    tx for tx in sync.added
    if amount_matches(tx, claim.amount, claim.date, claim.merchant)
]
if matches and abs(matches[0].amount - claim.amount) > tolerance:
    flag("inflated_claim", actual=matches[0].amount, claimed=claim.amount)
```

**Where it pays off:** auto theft, contents claims, repair invoices.
Catches the *amount* fraud that statistical scores miss.

### Identity API — `/identity/get`

Verifies the claimant against bank-of-record data.

```python
identity = plaid_client.identity_get(access_token)
owner = identity.accounts[0].owners[0]
verified = (
    name_match(claim.name, owner.names)
    and address_match(claim.address, owner.addresses)
    and any(claim.phone == p.data for p in owner.phone_numbers)
)
```

**Where it pays off:** identity-takeover fraud, claim-stuffing schemes.

### Income & Employment — `/credit/employment/get`

For disability and life claims, anchors the benefit calculation.

```python
record = plaid_client.credit_employment_get(access_token).items[0]
benefit = compute_disability_benefit(
    annual_income=record.pay.annual,
    pay_frequency=record.pay.pay_frequency,
    employment_status=record.status,
    policy=policy,
)
```

### Asset Report — `/asset_report/get`

Provides a financial-context check: large claims relative to net worth
signal elevated risk.

```python
report = plaid_client.asset_report_get(asset_report_token)
total_assets = sum(
    account.balances.current
    for item in report.report.items
    for account in item.accounts
)
if claim.amount > 0.5 * total_assets:
    flag("claim_to_assets_ratio_high", ratio=claim.amount / total_assets)
```

### Recurring transactions — `/transactions/recurring/get`

Confirms premium payments are flowing — i.e. the policy is genuinely
active despite what the policy admin system says.

```python
recurring = plaid_client.transactions_recurring_get(access_token)
premium_streams = [
    s for s in recurring.outflow_streams
    if "insurance" in (s.description or "").lower()
       or s.merchant_name in INSURANCE_MERCHANTS
]
```

## Scaler AI Labs · RLHF loop

The platform's improvement engine. Three pieces:

### 1. Labelling pipeline

Every adjudicator decision becomes a Scaler task pre-loaded with the
LLM's reasoning, the claim, and the Plaid evidence. Adjusters mark
*correct / incorrect / partially correct* and add free-text rationale.

```python
scale_client.create_task(
    project="claimsense_review",
    task_type="comparison",
    data={
        "claim_id": claim.id,
        "ai_decision": output.decision,
        "ai_reasoning": output.reasoning,
        "ai_payout": output.payout,
        "claim_details": claim.dict(),
        "plaid_evidence": evidence.dict(),
    },
    instruction=(
        "Was the verdict correct? Was the payout right? Was fraud "
        "handled appropriately? Provide reasoning."
    ),
)
```

### 2. Weekly cycle

```
Day 1-3 :  collect labelled decisions
Day 4-5 :  fit / refresh the reward model
Day 6   :  GRPO fine-tune on the new reward
Day 7   :  shadow-deploy and compare against the live model
            (promote if correctness improves and fraud capture stays ≥ live)
```

### 3. Quality dashboard

Tracked across iterations:

```python
metrics = {
    "verdict_correctness":   {"baseline": 0.72, "v1": 0.81, "v2": 0.87, "v3": 0.91},
    "fraud_capture":         {"baseline": 0.65, "v1": 0.78, "v2": 0.85, "v3": 0.92},
    "median_minutes":        {"baseline": 45,   "v1": 12,   "v2": 8,    "v3": 5},
    "savings_per_claim_usd": {"baseline": 0,    "v1": 45,   "v2": 72,   "v3": 95},
}
```

## Worked example — auto theft

```
Step 1  Claim submitted
        Claimant reports vehicle stolen. Claims $35,000.

Step 2  Plaid Link
        Bank account linked. Identity verified.

Step 3  Plaid Transactions sync
        Vehicle purchase located: $22,000, City Auto Sales, 2024-01-15.
        Discrepancy detected: claimed $35K, paid $22K.

Step 4  Plaid Asset Report
        Total assets $45,000. Claim is 78 % of net worth — flag raised.

Step 5  Adjudicator LLM
        risk_score = 0.85
        flags = ["amount_discrepancy", "claim_to_assets_ratio_high"]
        verdict = deny
        reason = "Inflated claim — bank-feed shows $22K transaction"

Step 6  Scaler review
        Adjuster confirms verdict. Free-text:
        "Solid catch — discrepancy alone is decisive."

Step 7  Weekly fine-tune
        Reward model up-weights "transaction discrepancy → deny" path.
```

## Business case

Reference customer: a regional insurer running ~100,000 personal-line
claims a year, average ticket $5,000, fraud rate 5%.

|  | Today | With ClaimSense |
|---|---:|---:|
| Median cycle time | 14 days | 2 hours |
| Fraud capture | 23 % | 91 % |
| False positives | 12 % | 3 % |
| Cost per claim | $150 | $35 |
| CSAT | 3.2 / 5 | 4.6 / 5 |

```
Fraud loss before:  3,850 missed × $5,000  = $19.25 M
Fraud loss after:     450 missed × $5,000  =  $2.25 M
Reduction in fraud loss .................. = $17.00 M

Processing cost before:  100,000 × $150    = $15.00 M
Processing cost after :  100,000 × $35     =  $3.50 M
Reduction in processing cost ............. = $11.50 M

Total annual savings ..................... = $28.50 M
```

## Roadmap

### Phase 1 — Foundations · months 1-2
- Plaid Transactions + Identity in production
- Reward model v0 from supervised labels
- FastAPI scoring endpoint
- Scaler project bootstrap

### Phase 2 — RLHF online · months 3-4
- Expert labelling UI
- GRPO/PPO weekly fine-tunes
- Shadow-deploy + A/B harness

### Phase 3 — Coverage expansion · months 5-6
- Income + Asset Plaid products
- Adjuster cockpit (read-only first)
- Real-time fraud-scoring API

### Phase 4 — Commercial scale · months 7-12
- Multi-tenant SaaS
- White-label option
- SOC2 / HIPAA / NAIC compliance work

## Technical stack snapshot

```yaml
runtime:
  language: Python 3.11+
  web:      FastAPI
  workers:  Celery on Redis
  rl:       OpenEnv (this gym), TRL/Unsloth for fine-tuning
  data:     PostgreSQL, S3 for evidence
integrations:
  plaid:   Transactions, Identity, Income, Assets, Recurring
  scaler:  RLHF labelling + reward modelling
  cloud:   AWS / GCP
deployment:
  preview:    Hugging Face Spaces (this Space)
  production: Docker / Kubernetes (single-tenant first)
```

## Coordinates

| Resource | Where |
|---|---|
| Live Space | <https://huggingface.co/spaces/akhiilll/claims-env> |
| Repo | (this directory) |
| Statement | OpenEnv Hackathon · 3.1 — Professional Tasks |
| Sub-theme | Scaler AI Labs — Enterprise Workflows |