@akhiilll on Hugging Face: "Just shipped ClaimSense Adjudication Gym at OpenEnv Hackathon 2026 (Scaler…"

Post

201

Just shipped ClaimSense Adjudication Gym at OpenEnv Hackathon 2026 (Scaler India).

An OpenEnv RL environment for enterprise insurance claims adjudication—the monthly “tool-heavy” workflow real adjusters do: pull policy + claim history, run fraud checks, verify purchase/transactions, then approve / deny / escalate under partial observability with long-horizon credit assignment.

Trained Qwen/Qwen2.5-1.5B-Instruct with:

Rollout evaluation on HF Jobs (A10G) and a random baseline for comparison
Real GRPO weight updates (TRL GRPOTrainer) with LoRA adapters and two independent reward functions (format + env replay)
Headline training evidence:

GRPO run: 80 steps, 640 rollouts, KL rises ~0 → ~0.06 (real weight updates), completion length shrinks (~25 → ~10).
Plots + logs are committed in the Space under runs/.
Live demo + repo + writeup linked below.

🔗 Env (Space URL): akhiilll/claims-env
🧪 Notebook: akhiilll/claims-env
📝 Blog: docs/HF_MINI_BLOG.md in the Space

Join the conversation