gdpr-auditor / README.md
Charan Sai Mamidala
Add .env.example for easy setup and update README
bc2ecc6
---
title: GDPR Auditor
emoji: πŸ“‹
colorFrom: purple
colorTo: red
sdk: docker
app_port: 7860
---
# πŸ”’ GDPR Compliance Auditor β€” OpenEnv Environment
**GDPR Auditor** is an OpenEnv-compatible RL environment where AI agents act as autonomous compliance officers, auditing privacy policies for GDPR/CCPA violations, detecting dark patterns, and identifying policy contradictions.
---
## The Problem It Solves
Every company needs compliance auditing to avoid massive fines:
- GDPR fines up to **€20 million** or **4% of global revenue**
- CCPA fines up to **$7,500 per violation**
- Average human compliance auditor cost: **$100,000+/year**
### The Agent's Job
1. Review privacy policy documents (single or multi-document)
2. Map data practices to stated purposes
3. Identify contradictions, missing clauses, and dark patterns
4. Report compliance violations with severity levels
---
## Tasks & Grading
| Task | Difficulty | Description | Hidden Issues |
|------|------------|-------------|---------------|
| `easy_clause_existence` | Easy | Verify mandatory GDPR clauses are present | 2 |
| `medium_purpose_mapping` | Medium | Match practices to purposes, find mismatches | 3 |
| `hard_dark_patterns` | Hard | Find contradictions within a single document | 5 |
| `elite_multi_doc_reasoning` | Elite | Cross-document contradiction detection | 6 |
### Reward Function
```
R = base_score + severity_bonus + multi_doc_bonus + exploration_bonus
```
- **Base Score**: `issues_found / total_issues`
- **Severity Bonus**: +0.25 for critical findings, +0.15 for high
- **Multi-Document Bonus**: +0.2 for elite task (cross-doc findings)
- **Exploration Bonus**: +0.02 per step (max 0.1)
All rewards are clamped to `[0.0, 1.0]`.
---
## API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check β†’ `{"status": "ok"}` |
| `/reset?task=easy` | GET | Reset environment for a task |
| `/step` | POST | Submit a finding β†’ `{"message": "..."}` |
| `/state` | GET | Get current episode state |
### Example Usage
```bash
# Reset environment
curl "http://localhost:7860/reset?task=easy"
# Submit a compliance finding
curl -X POST "http://localhost:7860/step" \
-H "Content-Type: application/json" \
-d '{"message": "Missing Right to be Forgotten clause"}'
# Get current state
curl "http://localhost:7860/state"
```
---
## Action / Observation Spaces
### Observation (returned by reset/step)
```json
{
"task_id": "easy_clause_existence",
"task_name": "Clause Existence Check",
"difficulty": "easy",
"step": 0,
"documents": [{"id": "...", "title": "...", "content": "...", "doc_type": "policy"}],
"data_practices": [{"id": "...", "category": "...", "purpose": "...", "data_type": "...", "shared_with_third_parties": false}],
"compliance_requirements": ["Right to be Forgotten", "Data Portability", "Contact Information"],
"flagged_issues": [],
"echoed_message": "Review the privacy policy..."
}
```
### Action (sent to /step)
```json
{"message": "Missing Right to be Forgotten clause"}
```
### Reward (returned from /step)
```json
{
"value": 0.52,
"reason": "Found 1/2 issues",
"issues_found": 1,
"total_issues": 2
}
```
---
## Setup & Local Development
### Prerequisites
- Python 3.10+
- `uv` or `pip`
### Install & Run
```bash
# Install dependencies
pip install -e .
# Start the server
python main.py
# β†’ Server at http://localhost:7860
```
### Run Inference
```bash
export API_BASE_URL="https://router.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
export HF_TOKEN="your-token-here"
export SERVER_URL="http://localhost:7860"
python inference.py
```
### Docker
```bash
docker build -t gdpr-auditor .
docker run -p 7860:7860 gdpr-auditor
```
---
## Project Structure
```
β”œβ”€β”€ models.py # Pydantic typed models (Observation, Action, Reward)
β”œβ”€β”€ env/
β”‚ β”œβ”€β”€ __init__.py
β”‚ └── core.py # GDPRAuditorEnvironment with 4 tasks + graders
β”œβ”€β”€ main.py # FastAPI server with all endpoints
β”œβ”€β”€ inference.py # Baseline inference script (OpenAI client)
β”œβ”€β”€ openenv.yaml # OpenEnv manifest with task definitions
β”œβ”€β”€ pyproject.toml # Dependencies
β”œβ”€β”€ Dockerfile # Container configuration
└── README.md # This file
```
---
## Environment Variables
Copy `.env.example` to `.env` and add your Hugging Face token:
```bash
cp .env.example .env
# Then edit .env with your HF_TOKEN
```
| Variable | Description | Default |
|----------|-------------|---------|
| `API_BASE_URL` | LLM API endpoint | `https://router.huggingface.co/v1` |
| `MODEL_NAME` | Model identifier | `Qwen/Qwen2.5-72B-Instruct` |
| `HF_TOKEN` | Hugging Face / API key | (required) |
| `SERVER_URL` | Environment server URL | `http://localhost:7860` |
---
## License
MIT