claims-env / docs /PRODUCT_VISION.md
akhiilll's picture
Deploy ClaimSense adjudication gym
1cfeb15 verified
# ClaimSense β€” Product Vision
> The hackathon submission ships an *RL gym*. This document describes
> the product the gym is the training ground for: a closed-loop claims
> intelligence platform that wires Plaid-style financial signals into
> an LLM adjudicator and uses Scaler AI Labs' RLHF tooling to keep the
> model honest week over week.
## Why this product exists
Insurers run claims through human adjusters because the workflow is
unforgiving: the wrong call costs real money, regulators audit the
reasoning, and fraudsters keep finding new angles. Naive LLM
deployments fail on this surface for three reasons:
1. **No investigation reflex.** They take the claim at face value
instead of pulling the policy, history, and supporting transactions.
2. **No grounding.** They hallucinate dollar amounts because nothing in
the prompt forces them to compare the claim against bank data.
3. **No correction loop.** A wrong call yesterday can be wrong again
tomorrow because nothing trains on the adjuster override.
ClaimSense solves all three.
## Platform shape
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ClaimSense AI Platform β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ Customer journey β”‚
β”‚ ────────────────────────────────────────────────────────── β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Portal │──▢│ Plaid Link │──▢│ Identity / Income gate β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ β–Ό β–Ό β”‚
β”‚ Adjudication core β”‚
β”‚ ────────────────────────────────────────────────────────── β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Plaid enrichment β€” transactions, identity, income, assets β”‚ β”‚
β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚
β”‚ β”‚ ClaimSense gym (this repo) β€” RL training surface β”‚ β”‚
β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚
β”‚ β”‚ Adjudicator LLM β€” fraud signals + coverage + settlement β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ β–Ό β”‚
β”‚ Improvement loop β”‚
β”‚ ────────────────────────────────────────────────────────── β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Scaler labelling β†’ reward model β†’ GRPO fine-tune (weekly) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## Plaid touch-points
The hackathon repo simulates the bank-feed interaction. In production,
five Plaid product calls move the needle:
### Transactions API β€” `/transactions/sync`
The single most powerful signal. Cross-references the claim amount
against actual purchases.
```python
sync = plaid_client.transactions_sync(access_token)
matches = [
tx for tx in sync.added
if amount_matches(tx, claim.amount, claim.date, claim.merchant)
]
if matches and abs(matches[0].amount - claim.amount) > tolerance:
flag("inflated_claim", actual=matches[0].amount, claimed=claim.amount)
```
**Where it pays off:** auto theft, contents claims, repair invoices.
Catches the *amount* fraud that statistical scores miss.
### Identity API β€” `/identity/get`
Verifies the claimant against bank-of-record data.
```python
identity = plaid_client.identity_get(access_token)
owner = identity.accounts[0].owners[0]
verified = (
name_match(claim.name, owner.names)
and address_match(claim.address, owner.addresses)
and any(claim.phone == p.data for p in owner.phone_numbers)
)
```
**Where it pays off:** identity-takeover fraud, claim-stuffing schemes.
### Income & Employment β€” `/credit/employment/get`
For disability and life claims, anchors the benefit calculation.
```python
record = plaid_client.credit_employment_get(access_token).items[0]
benefit = compute_disability_benefit(
annual_income=record.pay.annual,
pay_frequency=record.pay.pay_frequency,
employment_status=record.status,
policy=policy,
)
```
### Asset Report β€” `/asset_report/get`
Provides a financial-context check: large claims relative to net worth
signal elevated risk.
```python
report = plaid_client.asset_report_get(asset_report_token)
total_assets = sum(
account.balances.current
for item in report.report.items
for account in item.accounts
)
if claim.amount > 0.5 * total_assets:
flag("claim_to_assets_ratio_high", ratio=claim.amount / total_assets)
```
### Recurring transactions β€” `/transactions/recurring/get`
Confirms premium payments are flowing β€” i.e. the policy is genuinely
active despite what the policy admin system says.
```python
recurring = plaid_client.transactions_recurring_get(access_token)
premium_streams = [
s for s in recurring.outflow_streams
if "insurance" in (s.description or "").lower()
or s.merchant_name in INSURANCE_MERCHANTS
]
```
## Scaler AI Labs Β· RLHF loop
The platform's improvement engine. Three pieces:
### 1. Labelling pipeline
Every adjudicator decision becomes a Scaler task pre-loaded with the
LLM's reasoning, the claim, and the Plaid evidence. Adjusters mark
*correct / incorrect / partially correct* and add free-text rationale.
```python
scale_client.create_task(
project="claimsense_review",
task_type="comparison",
data={
"claim_id": claim.id,
"ai_decision": output.decision,
"ai_reasoning": output.reasoning,
"ai_payout": output.payout,
"claim_details": claim.dict(),
"plaid_evidence": evidence.dict(),
},
instruction=(
"Was the verdict correct? Was the payout right? Was fraud "
"handled appropriately? Provide reasoning."
),
)
```
### 2. Weekly cycle
```
Day 1-3 : collect labelled decisions
Day 4-5 : fit / refresh the reward model
Day 6 : GRPO fine-tune on the new reward
Day 7 : shadow-deploy and compare against the live model
(promote if correctness improves and fraud capture stays β‰₯ live)
```
### 3. Quality dashboard
Tracked across iterations:
```python
metrics = {
"verdict_correctness": {"baseline": 0.72, "v1": 0.81, "v2": 0.87, "v3": 0.91},
"fraud_capture": {"baseline": 0.65, "v1": 0.78, "v2": 0.85, "v3": 0.92},
"median_minutes": {"baseline": 45, "v1": 12, "v2": 8, "v3": 5},
"savings_per_claim_usd": {"baseline": 0, "v1": 45, "v2": 72, "v3": 95},
}
```
## Worked example β€” auto theft
```
Step 1 Claim submitted
Claimant reports vehicle stolen. Claims $35,000.
Step 2 Plaid Link
Bank account linked. Identity verified.
Step 3 Plaid Transactions sync
Vehicle purchase located: $22,000, City Auto Sales, 2024-01-15.
Discrepancy detected: claimed $35K, paid $22K.
Step 4 Plaid Asset Report
Total assets $45,000. Claim is 78 % of net worth β€” flag raised.
Step 5 Adjudicator LLM
risk_score = 0.85
flags = ["amount_discrepancy", "claim_to_assets_ratio_high"]
verdict = deny
reason = "Inflated claim β€” bank-feed shows $22K transaction"
Step 6 Scaler review
Adjuster confirms verdict. Free-text:
"Solid catch β€” discrepancy alone is decisive."
Step 7 Weekly fine-tune
Reward model up-weights "transaction discrepancy β†’ deny" path.
```
## Business case
Reference customer: a regional insurer running ~100,000 personal-line
claims a year, average ticket $5,000, fraud rate 5%.
| | Today | With ClaimSense |
|---|---:|---:|
| Median cycle time | 14 days | 2 hours |
| Fraud capture | 23 % | 91 % |
| False positives | 12 % | 3 % |
| Cost per claim | $150 | $35 |
| CSAT | 3.2 / 5 | 4.6 / 5 |
```
Fraud loss before: 3,850 missed Γ— $5,000 = $19.25 M
Fraud loss after: 450 missed Γ— $5,000 = $2.25 M
Reduction in fraud loss .................. = $17.00 M
Processing cost before: 100,000 Γ— $150 = $15.00 M
Processing cost after : 100,000 Γ— $35 = $3.50 M
Reduction in processing cost ............. = $11.50 M
Total annual savings ..................... = $28.50 M
```
## Roadmap
### Phase 1 β€” Foundations Β· months 1-2
- Plaid Transactions + Identity in production
- Reward model v0 from supervised labels
- FastAPI scoring endpoint
- Scaler project bootstrap
### Phase 2 β€” RLHF online Β· months 3-4
- Expert labelling UI
- GRPO/PPO weekly fine-tunes
- Shadow-deploy + A/B harness
### Phase 3 β€” Coverage expansion Β· months 5-6
- Income + Asset Plaid products
- Adjuster cockpit (read-only first)
- Real-time fraud-scoring API
### Phase 4 β€” Commercial scale Β· months 7-12
- Multi-tenant SaaS
- White-label option
- SOC2 / HIPAA / NAIC compliance work
## Technical stack snapshot
```yaml
runtime:
language: Python 3.11+
web: FastAPI
workers: Celery on Redis
rl: OpenEnv (this gym), TRL/Unsloth for fine-tuning
data: PostgreSQL, S3 for evidence
integrations:
plaid: Transactions, Identity, Income, Assets, Recurring
scaler: RLHF labelling + reward modelling
cloud: AWS / GCP
deployment:
preview: Hugging Face Spaces (this Space)
production: Docker / Kubernetes (single-tenant first)
```
## Coordinates
| Resource | Where |
|---|---|
| Live Space | <https://huggingface.co/spaces/akhiilll/claims-env> |
| Repo | (this directory) |
| Statement | OpenEnv Hackathon Β· 3.1 β€” Professional Tasks |
| Sub-theme | Scaler AI Labs β€” Enterprise Workflows |