gdpr-auditor / README.md
Charan Sai Mamidala
Add .env.example for easy setup and update README
bc2ecc6
metadata
title: GDPR Auditor
emoji: πŸ“‹
colorFrom: purple
colorTo: red
sdk: docker
app_port: 7860

πŸ”’ GDPR Compliance Auditor β€” OpenEnv Environment

GDPR Auditor is an OpenEnv-compatible RL environment where AI agents act as autonomous compliance officers, auditing privacy policies for GDPR/CCPA violations, detecting dark patterns, and identifying policy contradictions.


The Problem It Solves

Every company needs compliance auditing to avoid massive fines:

  • GDPR fines up to €20 million or 4% of global revenue
  • CCPA fines up to $7,500 per violation
  • Average human compliance auditor cost: $100,000+/year

The Agent's Job

  1. Review privacy policy documents (single or multi-document)
  2. Map data practices to stated purposes
  3. Identify contradictions, missing clauses, and dark patterns
  4. Report compliance violations with severity levels

Tasks & Grading

Task Difficulty Description Hidden Issues
easy_clause_existence Easy Verify mandatory GDPR clauses are present 2
medium_purpose_mapping Medium Match practices to purposes, find mismatches 3
hard_dark_patterns Hard Find contradictions within a single document 5
elite_multi_doc_reasoning Elite Cross-document contradiction detection 6

Reward Function

R = base_score + severity_bonus + multi_doc_bonus + exploration_bonus
  • Base Score: issues_found / total_issues
  • Severity Bonus: +0.25 for critical findings, +0.15 for high
  • Multi-Document Bonus: +0.2 for elite task (cross-doc findings)
  • Exploration Bonus: +0.02 per step (max 0.1)

All rewards are clamped to [0.0, 1.0].


API Endpoints

Endpoint Method Description
/health GET Health check β†’ {"status": "ok"}
/reset?task=easy GET Reset environment for a task
/step POST Submit a finding β†’ {"message": "..."}
/state GET Get current episode state

Example Usage

# Reset environment
curl "http://localhost:7860/reset?task=easy"

# Submit a compliance finding
curl -X POST "http://localhost:7860/step" \
  -H "Content-Type: application/json" \
  -d '{"message": "Missing Right to be Forgotten clause"}'

# Get current state
curl "http://localhost:7860/state"

Action / Observation Spaces

Observation (returned by reset/step)

{
  "task_id": "easy_clause_existence",
  "task_name": "Clause Existence Check",
  "difficulty": "easy",
  "step": 0,
  "documents": [{"id": "...", "title": "...", "content": "...", "doc_type": "policy"}],
  "data_practices": [{"id": "...", "category": "...", "purpose": "...", "data_type": "...", "shared_with_third_parties": false}],
  "compliance_requirements": ["Right to be Forgotten", "Data Portability", "Contact Information"],
  "flagged_issues": [],
  "echoed_message": "Review the privacy policy..."
}

Action (sent to /step)

{"message": "Missing Right to be Forgotten clause"}

Reward (returned from /step)

{
  "value": 0.52,
  "reason": "Found 1/2 issues",
  "issues_found": 1,
  "total_issues": 2
}

Setup & Local Development

Prerequisites

  • Python 3.10+
  • uv or pip

Install & Run

# Install dependencies
pip install -e .

# Start the server
python main.py
# β†’ Server at http://localhost:7860

Run Inference

export API_BASE_URL="https://router.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
export HF_TOKEN="your-token-here"
export SERVER_URL="http://localhost:7860"

python inference.py

Docker

docker build -t gdpr-auditor .
docker run -p 7860:7860 gdpr-auditor

Project Structure

β”œβ”€β”€ models.py          # Pydantic typed models (Observation, Action, Reward)
β”œβ”€β”€ env/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── core.py        # GDPRAuditorEnvironment with 4 tasks + graders
β”œβ”€β”€ main.py            # FastAPI server with all endpoints
β”œβ”€β”€ inference.py       # Baseline inference script (OpenAI client)
β”œβ”€β”€ openenv.yaml       # OpenEnv manifest with task definitions
β”œβ”€β”€ pyproject.toml     # Dependencies
β”œβ”€β”€ Dockerfile         # Container configuration
└── README.md          # This file

Environment Variables

Copy .env.example to .env and add your Hugging Face token:

cp .env.example .env
# Then edit .env with your HF_TOKEN
Variable Description Default
API_BASE_URL LLM API endpoint https://router.huggingface.co/v1
MODEL_NAME Model identifier Qwen/Qwen2.5-72B-Instruct
HF_TOKEN Hugging Face / API key (required)
SERVER_URL Environment server URL http://localhost:7860

License

MIT