Spaces:

ysharma
/

OPF-SmartRedact-Paste

Running on Zero

App Files Files Community

OPF-SmartRedact-Paste / README.md

ysharma HF Staff

Update README.md

ccb1f31 verified 15 days ago

preview code

raw

history blame contribute delete

4.25 kB

	---
	title: OPF PII Pastebin
	emoji: 📚
	colorFrom: purple
	colorTo: red
	sdk: gradio
	sdk_version: 6.13.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: Paste PII, share redacted view using OAI Privacy Filter
	---

	# Paste-Proxy

	A paste-to-share service with OpenAI Privacy Filter wired into the critical path. Authors paste sensitive text and get two URLs:

	- a public view link that serves the OPF-redacted version (placeholders like `<PRIVATE_PERSON>` instead of the original PII);
	- a private reveal link (guarded by an unguessable token) that serves the original to the author or anyone they deliberately share the reveal URL with.

	## Why this exists

	It's a demo of a pattern that doesn't fit `gr.Blocks` cleanly. `gr.Blocks` maps one event to one function call on a given session. A pastebin needs:

	1. Persistent server-side state keyed by a short URL — the paste must outlive any single session and be reachable by anyone at `/view/{id}`.
	2. Two distinct GET routes for the same resource — one public, one token-gated — served as real HTML pages (not component updates).
	3. A background cleanup task independent of any request, that sweeps pastes that have passed their TTL.

	`gr.Server` gives you a FastAPI app you decorate with the usual `@server.get` / `@server.post` while still getting Gradio API endpoints for the `gradio_client` SDK. Perfect fit.

	## Routes

	\| Method \| Path \| Purpose \|
	\|--------\|-----------------------------\|--------------------------------------------------------------\|
	\| GET \| `/` \| Compose page (paste editor) \|
	\| POST \| `/api/paste` \| Scan with OPF, mint `{id, reveal_token}`, store \|
	\| GET \| `/view/{id}` \| Public redacted HTML view \|
	\| GET \| `/view/{id}?token=...` \| Author's reveal HTML view (original with PII highlighted) \|
	\| GET \| `/api/paste/{id}` \| JSON: redacted + stats \|
	\| GET \| `/api/paste/{id}?token=...` \| JSON: redacted + stats + original + spans (token-gated) \|
	\| — \| `analyze_paste` (gr API) \| Programmatic paste creation for `gradio_client` \|

	Reveal tokens are 22 bytes from `secrets.token_urlsafe` and compared with `secrets.compare_digest`.

	## Auto-expiry

	The compose form offers never / 1h / 24h / 7d. A background daemon thread wakes every 30s and evicts expired pastes. Expired links 404.

	## Storage

	Pastes live in a process-local dict (`PASTES: dict[str, Paste]`) guarded by a lock. That's deliberate for a public demo — it makes the point of the architecture without coupling to Redis or a DB. When the Space restarts, pastes are wiped; the UI surfaces that on the 404 page.

	To make this production-grade you'd swap `_store_put/_store_get` for a Redis client (both operations are single-key writes/reads) and turn the sweeper into a `ZADD` on expiry time with a range-delete loop.

	## OPF model

	Uses OPENAI's Privacy Filter model (1.5B params, 50M active, 128k context) loaded from safetensors with the exact architecture + Viterbi-decoder pipeline.

	PII Categories handled: Person, Address, Email, Phone, URL, Date, Account, Secret. Redaction replaces each detected span with a `<CATEGORY>` placeholder (matching the format in `[Confidential, non-final draft] Redaction examples.pdf`).

	Inference runs behind `@spaces.GPU` so the model only pins a GPU slot during an actual paste scan (ZEROGPU)

	## Running locally

	```bash
	export HF_TOKEN=... # if the model repo is gated
	pip install -r requirements.txt
	python app.py # serves on :7860
	```

	## Programmatic API

	```python
	from gradio_client import Client
	c = Client("YOUR_SPACE_ID")
	resp = c.predict("Call me at 415-555-0123", ttl="1h", api_name="/analyze_paste")
	# resp is a JSON string with id, reveal_token, view_path, reveal_path, stats
	```
	Goto the link `https://ysharma-dummy-opf-3.hf.space/view/<VIEW_PATH>` for filtered/redacted version and to the link `https://ysharma-dummy-opf-3.hf.space/view/<REVEAL-PATH>` for the _author_ view.