Healthcare Fraud Detection with GRPO

⚠️ PRELIMINARY MODEL - 25 Training Steps Only

This model is a proof-of-concept for healthcare claims fraud detection using Group Relative Policy Optimization (GRPO). It has only been trained for 25 steps and requires additional training (200-500+ steps recommended) for production use.

Model Details

  • Base Model: Qwen/Qwen2.5-0.5B-Instruct (CORRECTED from Llama 3.2 1B)
  • Training Method: GRPO (Group Relative Policy Optimization)
  • Training Steps: 25 (preliminary)
  • Reward Improvement: +38% (from -40.74 to -25.35)
  • Initial Entropy: 2.27 (high uncertainty)
  • Hardware: NVIDIA RTX 4060 8GB
  • Quantization: 4-bit with LoRA adapters

Environment: Healthcare Claims Fraud Detection

OpenEnv-compatible RL environment with:

  • LLM-Native Actions: Text generation (Decision + Rationale + Evidence + Recommendation)
  • Multi-Component Reward: Decision (40%) + Rationale (30%) + Evidence (20%) + Efficiency (10%)
  • Hybrid Data: 60% Synthea synthetic + 40% CMS SynPUF real Medicare claims
  • 8 Fraud Patterns: Upcoding, unbundling, phantom billing, etc.
  • A2A Protocol: Compatible with Green Agents / AgentBeats

Usage

from transformers import pipeline

# Load model
generator = pipeline(
    "text-generation", 
    model="shylane/healthcare-fraud-detection-grpo", 
    device=0
)

# Generate fraud assessment
prompt = '''Assess this healthcare claim for fraud:
Provider: Dr. Smith
Service: MRI Scan
Amount: $2,500
...'''

result = generator(prompt, max_length=300, temperature=0.7)
print(result[0]["generated_text"])

Training Results

Preliminary (25 steps):

  • Shows +38% improvement over random baseline
  • Demonstrates environment provides meaningful learning signal
  • Entropy decreasing (model becoming more confident)

For Production:

  • Requires 200-500+ additional training steps
  • Consider larger model (1B-3B parameters) for complex reasoning
  • Multi-GPU training recommended for faster convergence

Limitations

⚠️ This is NOT a production-ready model:

  • Only 25 training steps completed
  • High initial entropy (2.27)
  • Still operating at loss (negative rewards)
  • Small model size (0.5B parameters)
  • Limited to synthetic/hybrid data

Challenge Entry

This model is an entry for the OpenEnv Student Challenge 2026 sponsored by Meta PyTorch, Hugging Face, and Unsloth AI.

Innovation: Moving from discrete action spaces to LLM-native text generation with structured reasoning for healthcare fraud detection.

Code & Documentation

Framework Versions

  • PEFT: 0.18.1
  • TRL: 0.27.1
  • Transformers: 5.0.0
  • PyTorch: 2.6.0+cu124

Citation

@misc{healthcare-fraud-grpo,
  title={Healthcare Fraud Detection with GRPO},
  author={OpenEnv Challenge Entry},
  year={2026},
  howpublished={\url{https://huggingface.co/shylane/healthcare-fraud-detection-grpo}}
}
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shylane/healthcare-fraud-detection-grpo

Adapter
(503)
this model