Spaces:
Running on Zero
Running on Zero
| title: OPF PII Pastebin | |
| emoji: π | |
| colorFrom: purple | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 6.13.0 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| short_description: Paste PII, share redacted view using OAI Privacy Filter | |
| # Paste-Proxy | |
| A paste-to-share service with OpenAI Privacy Filter wired into the critical path. Authors paste sensitive text and get two URLs: | |
| - a **public view link** that serves the OPF-redacted version (placeholders like `<PRIVATE_PERSON>` instead of the original PII); | |
| - a **private reveal link** (guarded by an unguessable token) that serves the original to the author or anyone they deliberately share the reveal URL with. | |
| ## Why this exists | |
| It's a demo of a pattern that doesn't fit `gr.Blocks` cleanly. `gr.Blocks` maps one event to one function call on a given session. A pastebin needs: | |
| 1. **Persistent server-side state keyed by a short URL** β the paste must outlive any single session and be reachable by anyone at `/view/{id}`. | |
| 2. **Two distinct GET routes for the same resource** β one public, one token-gated β served as real HTML pages (not component updates). | |
| 3. **A background cleanup task** independent of any request, that sweeps pastes that have passed their TTL. | |
| **`gr.Server` gives you a FastAPI app you decorate with the usual `@server.get` / `@server.post` while still getting Gradio API endpoints for the `gradio_client` SDK. Perfect fit.** | |
| ## Routes | |
| | Method | Path | Purpose | | |
| |--------|-----------------------------|--------------------------------------------------------------| | |
| | GET | `/` | Compose page (paste editor) | | |
| | POST | `/api/paste` | Scan with OPF, mint `{id, reveal_token}`, store | | |
| | GET | `/view/{id}` | Public redacted HTML view | | |
| | GET | `/view/{id}?token=...` | Author's reveal HTML view (original with PII highlighted) | | |
| | GET | `/api/paste/{id}` | JSON: redacted + stats | | |
| | GET | `/api/paste/{id}?token=...` | JSON: redacted + stats + original + spans (token-gated) | | |
| | β | `analyze_paste` (gr API) | Programmatic paste creation for `gradio_client` | | |
| Reveal tokens are 22 bytes from `secrets.token_urlsafe` and compared with `secrets.compare_digest`. | |
| ## Auto-expiry | |
| The compose form offers **never / 1h / 24h / 7d**. A background daemon thread wakes every 30s and evicts expired pastes. Expired links 404. | |
| ## Storage | |
| Pastes live in a process-local dict (`PASTES: dict[str, Paste]`) guarded by a lock. That's deliberate for a public demo β it makes the point of the architecture without coupling to Redis or a DB. When the Space restarts, pastes are wiped; the UI surfaces that on the 404 page. | |
| To make this production-grade you'd swap `_store_put/_store_get` for a Redis client (both operations are single-key writes/reads) and turn the sweeper into a `ZADD` on expiry time with a range-delete loop. | |
| ## OPF model | |
| Uses OPENAI's Privacy Filter model (1.5B params, 50M active, 128k context) loaded from safetensors with the exact architecture + Viterbi-decoder pipeline. | |
| PII Categories handled: Person, Address, Email, Phone, URL, Date, Account, Secret. Redaction replaces each detected span with a `<CATEGORY>` placeholder (matching the format in `[Confidential, non-final draft] Redaction examples.pdf`). | |
| Inference runs behind `@spaces.GPU` so the model only pins a GPU slot during an actual paste scan (**ZEROGPU**) | |
| ## Running locally | |
| ```bash | |
| export HF_TOKEN=... # if the model repo is gated | |
| pip install -r requirements.txt | |
| python app.py # serves on :7860 | |
| ``` | |
| ## Programmatic API | |
| ```python | |
| from gradio_client import Client | |
| c = Client("YOUR_SPACE_ID") | |
| resp = c.predict("Call me at 415-555-0123", ttl="1h", api_name="/analyze_paste") | |
| # resp is a JSON string with id, reveal_token, view_path, reveal_path, stats | |
| ``` | |
| Goto the link `https://ysharma-dummy-opf-3.hf.space/view/<VIEW_PATH>` for filtered/redacted version and to the link `https://ysharma-dummy-opf-3.hf.space/view/<REVEAL-PATH>` for the _author_ view. | |