Spaces:
Sleeping
Sleeping
File size: 11,963 Bytes
1cfeb15 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 | # ClaimSense β Product Vision
> The hackathon submission ships an *RL gym*. This document describes
> the product the gym is the training ground for: a closed-loop claims
> intelligence platform that wires Plaid-style financial signals into
> an LLM adjudicator and uses Scaler AI Labs' RLHF tooling to keep the
> model honest week over week.
## Why this product exists
Insurers run claims through human adjusters because the workflow is
unforgiving: the wrong call costs real money, regulators audit the
reasoning, and fraudsters keep finding new angles. Naive LLM
deployments fail on this surface for three reasons:
1. **No investigation reflex.** They take the claim at face value
instead of pulling the policy, history, and supporting transactions.
2. **No grounding.** They hallucinate dollar amounts because nothing in
the prompt forces them to compare the claim against bank data.
3. **No correction loop.** A wrong call yesterday can be wrong again
tomorrow because nothing trains on the adjuster override.
ClaimSense solves all three.
## Platform shape
```
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ClaimSense AI Platform β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Customer journey β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β
β β Portal ββββΆβ Plaid Link ββββΆβ Identity / Income gate β β
β βββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β
β β β β
β βΌ βΌ β
β Adjudication core β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Plaid enrichment β transactions, identity, income, assets β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β ClaimSense gym (this repo) β RL training surface β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β Adjudicator LLM β fraud signals + coverage + settlement β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β Improvement loop β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Scaler labelling β reward model β GRPO fine-tune (weekly) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
## Plaid touch-points
The hackathon repo simulates the bank-feed interaction. In production,
five Plaid product calls move the needle:
### Transactions API β `/transactions/sync`
The single most powerful signal. Cross-references the claim amount
against actual purchases.
```python
sync = plaid_client.transactions_sync(access_token)
matches = [
tx for tx in sync.added
if amount_matches(tx, claim.amount, claim.date, claim.merchant)
]
if matches and abs(matches[0].amount - claim.amount) > tolerance:
flag("inflated_claim", actual=matches[0].amount, claimed=claim.amount)
```
**Where it pays off:** auto theft, contents claims, repair invoices.
Catches the *amount* fraud that statistical scores miss.
### Identity API β `/identity/get`
Verifies the claimant against bank-of-record data.
```python
identity = plaid_client.identity_get(access_token)
owner = identity.accounts[0].owners[0]
verified = (
name_match(claim.name, owner.names)
and address_match(claim.address, owner.addresses)
and any(claim.phone == p.data for p in owner.phone_numbers)
)
```
**Where it pays off:** identity-takeover fraud, claim-stuffing schemes.
### Income & Employment β `/credit/employment/get`
For disability and life claims, anchors the benefit calculation.
```python
record = plaid_client.credit_employment_get(access_token).items[0]
benefit = compute_disability_benefit(
annual_income=record.pay.annual,
pay_frequency=record.pay.pay_frequency,
employment_status=record.status,
policy=policy,
)
```
### Asset Report β `/asset_report/get`
Provides a financial-context check: large claims relative to net worth
signal elevated risk.
```python
report = plaid_client.asset_report_get(asset_report_token)
total_assets = sum(
account.balances.current
for item in report.report.items
for account in item.accounts
)
if claim.amount > 0.5 * total_assets:
flag("claim_to_assets_ratio_high", ratio=claim.amount / total_assets)
```
### Recurring transactions β `/transactions/recurring/get`
Confirms premium payments are flowing β i.e. the policy is genuinely
active despite what the policy admin system says.
```python
recurring = plaid_client.transactions_recurring_get(access_token)
premium_streams = [
s for s in recurring.outflow_streams
if "insurance" in (s.description or "").lower()
or s.merchant_name in INSURANCE_MERCHANTS
]
```
## Scaler AI Labs Β· RLHF loop
The platform's improvement engine. Three pieces:
### 1. Labelling pipeline
Every adjudicator decision becomes a Scaler task pre-loaded with the
LLM's reasoning, the claim, and the Plaid evidence. Adjusters mark
*correct / incorrect / partially correct* and add free-text rationale.
```python
scale_client.create_task(
project="claimsense_review",
task_type="comparison",
data={
"claim_id": claim.id,
"ai_decision": output.decision,
"ai_reasoning": output.reasoning,
"ai_payout": output.payout,
"claim_details": claim.dict(),
"plaid_evidence": evidence.dict(),
},
instruction=(
"Was the verdict correct? Was the payout right? Was fraud "
"handled appropriately? Provide reasoning."
),
)
```
### 2. Weekly cycle
```
Day 1-3 : collect labelled decisions
Day 4-5 : fit / refresh the reward model
Day 6 : GRPO fine-tune on the new reward
Day 7 : shadow-deploy and compare against the live model
(promote if correctness improves and fraud capture stays β₯ live)
```
### 3. Quality dashboard
Tracked across iterations:
```python
metrics = {
"verdict_correctness": {"baseline": 0.72, "v1": 0.81, "v2": 0.87, "v3": 0.91},
"fraud_capture": {"baseline": 0.65, "v1": 0.78, "v2": 0.85, "v3": 0.92},
"median_minutes": {"baseline": 45, "v1": 12, "v2": 8, "v3": 5},
"savings_per_claim_usd": {"baseline": 0, "v1": 45, "v2": 72, "v3": 95},
}
```
## Worked example β auto theft
```
Step 1 Claim submitted
Claimant reports vehicle stolen. Claims $35,000.
Step 2 Plaid Link
Bank account linked. Identity verified.
Step 3 Plaid Transactions sync
Vehicle purchase located: $22,000, City Auto Sales, 2024-01-15.
Discrepancy detected: claimed $35K, paid $22K.
Step 4 Plaid Asset Report
Total assets $45,000. Claim is 78 % of net worth β flag raised.
Step 5 Adjudicator LLM
risk_score = 0.85
flags = ["amount_discrepancy", "claim_to_assets_ratio_high"]
verdict = deny
reason = "Inflated claim β bank-feed shows $22K transaction"
Step 6 Scaler review
Adjuster confirms verdict. Free-text:
"Solid catch β discrepancy alone is decisive."
Step 7 Weekly fine-tune
Reward model up-weights "transaction discrepancy β deny" path.
```
## Business case
Reference customer: a regional insurer running ~100,000 personal-line
claims a year, average ticket $5,000, fraud rate 5%.
| | Today | With ClaimSense |
|---|---:|---:|
| Median cycle time | 14 days | 2 hours |
| Fraud capture | 23 % | 91 % |
| False positives | 12 % | 3 % |
| Cost per claim | $150 | $35 |
| CSAT | 3.2 / 5 | 4.6 / 5 |
```
Fraud loss before: 3,850 missed Γ $5,000 = $19.25 M
Fraud loss after: 450 missed Γ $5,000 = $2.25 M
Reduction in fraud loss .................. = $17.00 M
Processing cost before: 100,000 Γ $150 = $15.00 M
Processing cost after : 100,000 Γ $35 = $3.50 M
Reduction in processing cost ............. = $11.50 M
Total annual savings ..................... = $28.50 M
```
## Roadmap
### Phase 1 β Foundations Β· months 1-2
- Plaid Transactions + Identity in production
- Reward model v0 from supervised labels
- FastAPI scoring endpoint
- Scaler project bootstrap
### Phase 2 β RLHF online Β· months 3-4
- Expert labelling UI
- GRPO/PPO weekly fine-tunes
- Shadow-deploy + A/B harness
### Phase 3 β Coverage expansion Β· months 5-6
- Income + Asset Plaid products
- Adjuster cockpit (read-only first)
- Real-time fraud-scoring API
### Phase 4 β Commercial scale Β· months 7-12
- Multi-tenant SaaS
- White-label option
- SOC2 / HIPAA / NAIC compliance work
## Technical stack snapshot
```yaml
runtime:
language: Python 3.11+
web: FastAPI
workers: Celery on Redis
rl: OpenEnv (this gym), TRL/Unsloth for fine-tuning
data: PostgreSQL, S3 for evidence
integrations:
plaid: Transactions, Identity, Income, Assets, Recurring
scaler: RLHF labelling + reward modelling
cloud: AWS / GCP
deployment:
preview: Hugging Face Spaces (this Space)
production: Docker / Kubernetes (single-tenant first)
```
## Coordinates
| Resource | Where |
|---|---|
| Live Space | <https://huggingface.co/spaces/akhiilll/claims-env> |
| Repo | (this directory) |
| Statement | OpenEnv Hackathon Β· 3.1 β Professional Tasks |
| Sub-theme | Scaler AI Labs β Enterprise Workflows |
|