Spaces:

ysharma
/

OPF-SmartRedact-Paste

Running on Zero

App Files Files Community

OPF-SmartRedact-Paste / README.md

ysharma HF Staff

Update README.md

ccb1f31 verified 15 days ago

preview code

raw

history blame contribute delete

4.25 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: OPF PII Pastebin
emoji: 📚
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 6.13.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Paste PII, share redacted view using OAI Privacy Filter

Paste-Proxy

A paste-to-share service with OpenAI Privacy Filter wired into the critical path. Authors paste sensitive text and get two URLs:

a public view link that serves the OPF-redacted version (placeholders like <PRIVATE_PERSON> instead of the original PII);
a private reveal link (guarded by an unguessable token) that serves the original to the author or anyone they deliberately share the reveal URL with.

Why this exists

It's a demo of a pattern that doesn't fit gr.Blocks cleanly. gr.Blocks maps one event to one function call on a given session. A pastebin needs:

Persistent server-side state keyed by a short URL — the paste must outlive any single session and be reachable by anyone at /view/{id}.
Two distinct GET routes for the same resource — one public, one token-gated — served as real HTML pages (not component updates).
A background cleanup task independent of any request, that sweeps pastes that have passed their TTL.

gr.Server gives you a FastAPI app you decorate with the usual @server.get / @server.post while still getting Gradio API endpoints for the gradio_client SDK. Perfect fit.

Routes

Method	Path	Purpose
GET	`/`	Compose page (paste editor)
POST	`/api/paste`	Scan with OPF, mint `{id, reveal_token}`, store
GET	`/view/{id}`	Public redacted HTML view
GET	`/view/{id}?token=...`	Author's reveal HTML view (original with PII highlighted)
GET	`/api/paste/{id}`	JSON: redacted + stats
GET	`/api/paste/{id}?token=...`	JSON: redacted + stats + original + spans (token-gated)
—	`analyze_paste` (gr API)	Programmatic paste creation for `gradio_client`

Reveal tokens are 22 bytes from secrets.token_urlsafe and compared with secrets.compare_digest.

Auto-expiry

The compose form offers never / 1h / 24h / 7d. A background daemon thread wakes every 30s and evicts expired pastes. Expired links 404.

Storage

Pastes live in a process-local dict (PASTES: dict[str, Paste]) guarded by a lock. That's deliberate for a public demo — it makes the point of the architecture without coupling to Redis or a DB. When the Space restarts, pastes are wiped; the UI surfaces that on the 404 page.

To make this production-grade you'd swap _store_put/_store_get for a Redis client (both operations are single-key writes/reads) and turn the sweeper into a ZADD on expiry time with a range-delete loop.

OPF model

Uses OPENAI's Privacy Filter model (1.5B params, 50M active, 128k context) loaded from safetensors with the exact architecture + Viterbi-decoder pipeline.

PII Categories handled: Person, Address, Email, Phone, URL, Date, Account, Secret. Redaction replaces each detected span with a <CATEGORY> placeholder (matching the format in [Confidential, non-final draft] Redaction examples.pdf).

Inference runs behind @spaces.GPU so the model only pins a GPU slot during an actual paste scan (ZEROGPU)

Running locally

export HF_TOKEN=...   # if the model repo is gated
pip install -r requirements.txt
python app.py         # serves on :7860

Programmatic API

from gradio_client import Client
c = Client("YOUR_SPACE_ID")
resp = c.predict("Call me at 415-555-0123", ttl="1h", api_name="/analyze_paste")
# resp is a JSON string with id, reveal_token, view_path, reveal_path, stats

Goto the link https://ysharma-dummy-opf-3.hf.space/view/<VIEW_PATH> for filtered/redacted version and to the link https://ysharma-dummy-opf-3.hf.space/view/<REVEAL-PATH> for the author view.