Spaces:
Running on Zero
Running on Zero
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
emoji: π
|
| 4 |
colorFrom: purple
|
| 5 |
colorTo: red
|
|
@@ -8,7 +8,72 @@ sdk_version: 6.13.0
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: apache-2.0
|
| 11 |
-
short_description:
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: OPF PII Pastebin
|
| 3 |
emoji: π
|
| 4 |
colorFrom: purple
|
| 5 |
colorTo: red
|
|
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: apache-2.0
|
| 11 |
+
short_description: Paste PII, share redacted view using OAI Privacy Filter
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# Paste-Proxy
|
| 15 |
+
|
| 16 |
+
A paste-to-share service with OpenAI Privacy Filter wired into the critical path. Authors paste sensitive text and get two URLs:
|
| 17 |
+
|
| 18 |
+
- a **public view link** that serves the OPF-redacted version (placeholders like `<PRIVATE_PERSON>` instead of the original PII);
|
| 19 |
+
- a **private reveal link** (guarded by an unguessable token) that serves the original to the author or anyone they deliberately share the reveal URL with.
|
| 20 |
+
|
| 21 |
+
## Why this exists
|
| 22 |
+
|
| 23 |
+
It's a demo of a pattern that doesn't fit `gr.Blocks` cleanly. `gr.Blocks` maps one event to one function call on a given session. A pastebin needs:
|
| 24 |
+
|
| 25 |
+
1. **Persistent server-side state keyed by a short URL** β the paste must outlive any single session and be reachable by anyone at `/view/{id}`.
|
| 26 |
+
2. **Two distinct GET routes for the same resource** β one public, one token-gated β served as real HTML pages (not component updates).
|
| 27 |
+
3. **A background cleanup task** independent of any request, that sweeps pastes that have passed their TTL.
|
| 28 |
+
|
| 29 |
+
**`gr.Server` gives you a FastAPI app you decorate with the usual `@server.get` / `@server.post` while still getting Gradio API endpoints for the `gradio_client` SDK. Perfect fit.**
|
| 30 |
+
|
| 31 |
+
## Routes
|
| 32 |
+
|
| 33 |
+
| Method | Path | Purpose |
|
| 34 |
+
|--------|-----------------------------|--------------------------------------------------------------|
|
| 35 |
+
| GET | `/` | Compose page (paste editor) |
|
| 36 |
+
| POST | `/api/paste` | Scan with OPF, mint `{id, reveal_token}`, store |
|
| 37 |
+
| GET | `/view/{id}` | Public redacted HTML view |
|
| 38 |
+
| GET | `/view/{id}?token=...` | Author's reveal HTML view (original with PII highlighted) |
|
| 39 |
+
| GET | `/api/paste/{id}` | JSON: redacted + stats |
|
| 40 |
+
| GET | `/api/paste/{id}?token=...` | JSON: redacted + stats + original + spans (token-gated) |
|
| 41 |
+
| β | `analyze_paste` (gr API) | Programmatic paste creation for `gradio_client` |
|
| 42 |
+
|
| 43 |
+
Reveal tokens are 22 bytes from `secrets.token_urlsafe` and compared with `secrets.compare_digest`.
|
| 44 |
+
|
| 45 |
+
## Auto-expiry
|
| 46 |
+
|
| 47 |
+
The compose form offers **never / 1h / 24h / 7d**. A background daemon thread wakes every 30s and evicts expired pastes. Expired links 404.
|
| 48 |
+
|
| 49 |
+
## Storage
|
| 50 |
+
|
| 51 |
+
Pastes live in a process-local dict (`PASTES: dict[str, Paste]`) guarded by a lock. That's deliberate for a public demo β it makes the point of the architecture without coupling to Redis or a DB. When the Space restarts, pastes are wiped; the UI surfaces that on the 404 page.
|
| 52 |
+
|
| 53 |
+
To make this production-grade you'd swap `_store_put/_store_get` for a Redis client (both operations are single-key writes/reads) and turn the sweeper into a `ZADD` on expiry time with a range-delete loop.
|
| 54 |
+
|
| 55 |
+
## OPF model
|
| 56 |
+
|
| 57 |
+
Uses OPENAI's Privacy Filter model (1.5B params, 50M active, 128k context) loaded from safetensors with the exact architecture + Viterbi-decoder pipeline.
|
| 58 |
+
|
| 59 |
+
PII Categories handled: Person, Address, Email, Phone, URL, Date, Account, Secret. Redaction replaces each detected span with a `<CATEGORY>` placeholder (matching the format in `[Confidential, non-final draft] Redaction examples.pdf`).
|
| 60 |
+
|
| 61 |
+
Inference runs behind `@spaces.GPU` so the model only pins a GPU slot during an actual paste scan (**ZEROGPU**)
|
| 62 |
+
|
| 63 |
+
## Running locally
|
| 64 |
+
|
| 65 |
+
```bash
|
| 66 |
+
export HF_TOKEN=... # if the model repo is gated
|
| 67 |
+
pip install -r requirements.txt
|
| 68 |
+
python app.py # serves on :7860
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
## Programmatic API
|
| 72 |
+
|
| 73 |
+
```python
|
| 74 |
+
from gradio_client import Client
|
| 75 |
+
c = Client("YOUR_SPACE_ID")
|
| 76 |
+
resp = c.predict("Call me at 415-555-0123", ttl="1h", api_name="/analyze_paste")
|
| 77 |
+
# resp is a JSON string with id, reveal_token, view_path, reveal_path, stats
|
| 78 |
+
```
|
| 79 |
+
Goto the link `https://ysharma-dummy-opf-3.hf.space/view/<VIEW_PATH>` for filtered/redacted version and to the link `https://ysharma-dummy-opf-3.hf.space/view/<REVEAL-PATH>` for the _author_ view.
|