File size: 4,250 Bytes
3580321
ccb1f31
3580321
 
 
 
 
 
 
 
ccb1f31
3580321
 
ccb1f31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
title: OPF PII Pastebin
emoji: πŸ“š
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 6.13.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Paste PII, share redacted view using OAI Privacy Filter
---

# Paste-Proxy

A paste-to-share service with OpenAI Privacy Filter wired into the critical path. Authors paste sensitive text and get two URLs:

- a **public view link** that serves the OPF-redacted version (placeholders like `<PRIVATE_PERSON>` instead of the original PII);
- a **private reveal link** (guarded by an unguessable token) that serves the original to the author or anyone they deliberately share the reveal URL with.

## Why this exists

It's a demo of a pattern that doesn't fit `gr.Blocks` cleanly. `gr.Blocks` maps one event to one function call on a given session. A pastebin needs:

1. **Persistent server-side state keyed by a short URL** β€” the paste must outlive any single session and be reachable by anyone at `/view/{id}`.
2. **Two distinct GET routes for the same resource** β€” one public, one token-gated β€” served as real HTML pages (not component updates).
3. **A background cleanup task** independent of any request, that sweeps pastes that have passed their TTL.

**`gr.Server` gives you a FastAPI app you decorate with the usual `@server.get` / `@server.post` while still getting Gradio API endpoints for the `gradio_client` SDK. Perfect fit.**

## Routes

| Method | Path                        | Purpose                                                      |
|--------|-----------------------------|--------------------------------------------------------------|
| GET    | `/`                         | Compose page (paste editor)                                  |
| POST   | `/api/paste`                | Scan with OPF, mint `{id, reveal_token}`, store              |
| GET    | `/view/{id}`                | Public redacted HTML view                                    |
| GET    | `/view/{id}?token=...`      | Author's reveal HTML view (original with PII highlighted)    |
| GET    | `/api/paste/{id}`           | JSON: redacted + stats                                       |
| GET    | `/api/paste/{id}?token=...` | JSON: redacted + stats + original + spans (token-gated)      |
| β€”      | `analyze_paste` (gr API)    | Programmatic paste creation for `gradio_client`              |

Reveal tokens are 22 bytes from `secrets.token_urlsafe` and compared with `secrets.compare_digest`.

## Auto-expiry

The compose form offers **never / 1h / 24h / 7d**. A background daemon thread wakes every 30s and evicts expired pastes. Expired links 404.

## Storage

Pastes live in a process-local dict (`PASTES: dict[str, Paste]`) guarded by a lock. That's deliberate for a public demo β€” it makes the point of the architecture without coupling to Redis or a DB. When the Space restarts, pastes are wiped; the UI surfaces that on the 404 page.

To make this production-grade you'd swap `_store_put/_store_get` for a Redis client (both operations are single-key writes/reads) and turn the sweeper into a `ZADD` on expiry time with a range-delete loop.

## OPF model

Uses OPENAI's Privacy Filter model (1.5B params, 50M active, 128k context) loaded from safetensors with the exact architecture + Viterbi-decoder pipeline. 

PII Categories handled: Person, Address, Email, Phone, URL, Date, Account, Secret. Redaction replaces each detected span with a `<CATEGORY>` placeholder (matching the format in `[Confidential, non-final draft] Redaction examples.pdf`).

Inference runs behind `@spaces.GPU` so the model only pins a GPU slot during an actual paste scan (**ZEROGPU**)

## Running locally

```bash
export HF_TOKEN=...   # if the model repo is gated
pip install -r requirements.txt
python app.py         # serves on :7860
```

## Programmatic API

```python
from gradio_client import Client
c = Client("YOUR_SPACE_ID")
resp = c.predict("Call me at 415-555-0123", ttl="1h", api_name="/analyze_paste")
# resp is a JSON string with id, reveal_token, view_path, reveal_path, stats
```
Goto the link `https://ysharma-dummy-opf-3.hf.space/view/<VIEW_PATH>` for filtered/redacted version and to the link `https://ysharma-dummy-opf-3.hf.space/view/<REVEAL-PATH>` for the _author_ view.