Spaces:

akhiilll
/

claims-env

Sleeping

App Files Files Community

claims-env / docs /PRODUCT_VISION.md

akhiilll

Deploy ClaimSense adjudication gym

1cfeb15 verified 13 days ago

preview code

raw

history blame contribute delete

12 kB

	# ClaimSense — Product Vision

	> The hackathon submission ships an RL gym. This document describes
	> the product the gym is the training ground for: a closed-loop claims
	> intelligence platform that wires Plaid-style financial signals into
	> an LLM adjudicator and uses Scaler AI Labs' RLHF tooling to keep the
	> model honest week over week.

	## Why this product exists

	Insurers run claims through human adjusters because the workflow is
	unforgiving: the wrong call costs real money, regulators audit the
	reasoning, and fraudsters keep finding new angles. Naive LLM
	deployments fail on this surface for three reasons:

	1. No investigation reflex. They take the claim at face value
	instead of pulling the policy, history, and supporting transactions.
	2. No grounding. They hallucinate dollar amounts because nothing in
	the prompt forces them to compare the claim against bank data.
	3. No correction loop. A wrong call yesterday can be wrong again
	tomorrow because nothing trains on the adjuster override.

	ClaimSense solves all three.

	## Platform shape

	```
	┌──────────────────────────────────────────────────────────────────────┐
	│ ClaimSense AI Platform │
	├──────────────────────────────────────────────────────────────────────┤
	│ │
	│ Customer journey │
	│ ────────────────────────────────────────────────────────── │
	│ ┌─────────┐ ┌──────────────┐ ┌──────────────────────────┐ │
	│ │ Portal │──▶│ Plaid Link │──▶│ Identity / Income gate │ │
	│ └─────────┘ └──────────────┘ └──────────────────────────┘ │
	│ │ │ │
	│ ▼ ▼ │
	│ Adjudication core │
	│ ────────────────────────────────────────────────────────── │
	│ ┌────────────────────────────────────────────────────────────┐ │
	│ │ Plaid enrichment — transactions, identity, income, assets │ │
	│ ├────────────────────────────────────────────────────────────┤ │
	│ │ ClaimSense gym (this repo) — RL training surface │ │
	│ ├────────────────────────────────────────────────────────────┤ │
	│ │ Adjudicator LLM — fraud signals + coverage + settlement │ │
	│ └────────────────────────────────────────────────────────────┘ │
	│ │ │
	│ ▼ │
	│ Improvement loop │
	│ ────────────────────────────────────────────────────────── │
	│ ┌────────────────────────────────────────────────────────────┐ │
	│ │ Scaler labelling → reward model → GRPO fine-tune (weekly) │ │
	│ └────────────────────────────────────────────────────────────┘ │
	└──────────────────────────────────────────────────────────────────────┘
	```

	## Plaid touch-points

	The hackathon repo simulates the bank-feed interaction. In production,
	five Plaid product calls move the needle:

	### Transactions API — `/transactions/sync`

	The single most powerful signal. Cross-references the claim amount
	against actual purchases.

	```python
	sync = plaid_client.transactions_sync(access_token)
	matches = [
	tx for tx in sync.added
	if amount_matches(tx, claim.amount, claim.date, claim.merchant)
	]
	if matches and abs(matches[0].amount - claim.amount) > tolerance:
	flag("inflated_claim", actual=matches[0].amount, claimed=claim.amount)
	```

	Where it pays off: auto theft, contents claims, repair invoices.
	Catches the amount fraud that statistical scores miss.

	### Identity API — `/identity/get`

	Verifies the claimant against bank-of-record data.

	```python
	identity = plaid_client.identity_get(access_token)
	owner = identity.accounts[0].owners[0]
	verified = (
	name_match(claim.name, owner.names)
	and address_match(claim.address, owner.addresses)
	and any(claim.phone == p.data for p in owner.phone_numbers)
	)
	```

	Where it pays off: identity-takeover fraud, claim-stuffing schemes.

	### Income & Employment — `/credit/employment/get`

	For disability and life claims, anchors the benefit calculation.

	```python
	record = plaid_client.credit_employment_get(access_token).items[0]
	benefit = compute_disability_benefit(
	annual_income=record.pay.annual,
	pay_frequency=record.pay.pay_frequency,
	employment_status=record.status,
	policy=policy,
	)
	```

	### Asset Report — `/asset_report/get`

	Provides a financial-context check: large claims relative to net worth
	signal elevated risk.

	```python
	report = plaid_client.asset_report_get(asset_report_token)
	total_assets = sum(
	account.balances.current
	for item in report.report.items
	for account in item.accounts
	)
	if claim.amount > 0.5 * total_assets:
	flag("claim_to_assets_ratio_high", ratio=claim.amount / total_assets)
	```

	### Recurring transactions — `/transactions/recurring/get`

	Confirms premium payments are flowing — i.e. the policy is genuinely
	active despite what the policy admin system says.

	```python
	recurring = plaid_client.transactions_recurring_get(access_token)
	premium_streams = [
	s for s in recurring.outflow_streams
	if "insurance" in (s.description or "").lower()
	or s.merchant_name in INSURANCE_MERCHANTS
	]
	```

	## Scaler AI Labs · RLHF loop

	The platform's improvement engine. Three pieces:

	### 1. Labelling pipeline

	Every adjudicator decision becomes a Scaler task pre-loaded with the
	LLM's reasoning, the claim, and the Plaid evidence. Adjusters mark
	correct / incorrect / partially correct and add free-text rationale.

	```python
	scale_client.create_task(
	project="claimsense_review",
	task_type="comparison",
	data={
	"claim_id": claim.id,
	"ai_decision": output.decision,
	"ai_reasoning": output.reasoning,
	"ai_payout": output.payout,
	"claim_details": claim.dict(),
	"plaid_evidence": evidence.dict(),
	},
	instruction=(
	"Was the verdict correct? Was the payout right? Was fraud "
	"handled appropriately? Provide reasoning."
	),
	)
	```

	### 2. Weekly cycle

	```
	Day 1-3 : collect labelled decisions
	Day 4-5 : fit / refresh the reward model
	Day 6 : GRPO fine-tune on the new reward
	Day 7 : shadow-deploy and compare against the live model
	(promote if correctness improves and fraud capture stays ≥ live)
	```

	### 3. Quality dashboard

	Tracked across iterations:

	```python
	metrics = {
	"verdict_correctness": {"baseline": 0.72, "v1": 0.81, "v2": 0.87, "v3": 0.91},
	"fraud_capture": {"baseline": 0.65, "v1": 0.78, "v2": 0.85, "v3": 0.92},
	"median_minutes": {"baseline": 45, "v1": 12, "v2": 8, "v3": 5},
	"savings_per_claim_usd": {"baseline": 0, "v1": 45, "v2": 72, "v3": 95},
	}
	```

	## Worked example — auto theft

	```
	Step 1 Claim submitted
	Claimant reports vehicle stolen. Claims $35,000.

	Step 2 Plaid Link
	Bank account linked. Identity verified.

	Step 3 Plaid Transactions sync
	Vehicle purchase located: $22,000, City Auto Sales, 2024-01-15.
	Discrepancy detected: claimed $35K, paid $22K.

	Step 4 Plaid Asset Report
	Total assets $45,000. Claim is 78 % of net worth — flag raised.

	Step 5 Adjudicator LLM
	risk_score = 0.85
	flags = ["amount_discrepancy", "claim_to_assets_ratio_high"]
	verdict = deny
	reason = "Inflated claim — bank-feed shows $22K transaction"

	Step 6 Scaler review
	Adjuster confirms verdict. Free-text:
	"Solid catch — discrepancy alone is decisive."

	Step 7 Weekly fine-tune
	Reward model up-weights "transaction discrepancy → deny" path.
	```

	## Business case

	Reference customer: a regional insurer running ~100,000 personal-line
	claims a year, average ticket $5,000, fraud rate 5%.

	\| \| Today \| With ClaimSense \|
	\|---\|---:\|---:\|
	\| Median cycle time \| 14 days \| 2 hours \|
	\| Fraud capture \| 23 % \| 91 % \|
	\| False positives \| 12 % \| 3 % \|
	\| Cost per claim \| $150 \| $35 \|
	\| CSAT \| 3.2 / 5 \| 4.6 / 5 \|

	```
	Fraud loss before: 3,850 missed × $5,000 = $19.25 M
	Fraud loss after: 450 missed × $5,000 = $2.25 M
	Reduction in fraud loss .................. = $17.00 M

	Processing cost before: 100,000 × $150 = $15.00 M
	Processing cost after : 100,000 × $35 = $3.50 M
	Reduction in processing cost ............. = $11.50 M

	Total annual savings ..................... = $28.50 M
	```

	## Roadmap

	### Phase 1 — Foundations · months 1-2
	- Plaid Transactions + Identity in production
	- Reward model v0 from supervised labels
	- FastAPI scoring endpoint
	- Scaler project bootstrap

	### Phase 2 — RLHF online · months 3-4
	- Expert labelling UI
	- GRPO/PPO weekly fine-tunes
	- Shadow-deploy + A/B harness

	### Phase 3 — Coverage expansion · months 5-6
	- Income + Asset Plaid products
	- Adjuster cockpit (read-only first)
	- Real-time fraud-scoring API

	### Phase 4 — Commercial scale · months 7-12
	- Multi-tenant SaaS
	- White-label option
	- SOC2 / HIPAA / NAIC compliance work

	## Technical stack snapshot

	```yaml
	runtime:
	language: Python 3.11+
	web: FastAPI
	workers: Celery on Redis
	rl: OpenEnv (this gym), TRL/Unsloth for fine-tuning
	data: PostgreSQL, S3 for evidence
	integrations:
	plaid: Transactions, Identity, Income, Assets, Recurring
	scaler: RLHF labelling + reward modelling
	cloud: AWS / GCP
	deployment:
	preview: Hugging Face Spaces (this Space)
	production: Docker / Kubernetes (single-tenant first)
	```

	## Coordinates

	\| Resource \| Where \|
	\|---\|---\|
	\| Live Space \| <https://huggingface.co/spaces/akhiilll/claims-env> \|
	\| Repo \| (this directory) \|
	\| Statement \| OpenEnv Hackathon · 3.1 — Professional Tasks \|
	\| Sub-theme \| Scaler AI Labs — Enterprise Workflows \|