ysharma HF Staff commited on
Commit
ccb1f31
Β·
verified Β·
1 Parent(s): 754de8e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -3
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Dummy Opf 3
3
  emoji: πŸ“š
4
  colorFrom: purple
5
  colorTo: red
@@ -8,7 +8,72 @@ sdk_version: 6.13.0
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
- short_description: dummpy opf demo
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: OPF PII Pastebin
3
  emoji: πŸ“š
4
  colorFrom: purple
5
  colorTo: red
 
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
+ short_description: Paste PII, share redacted view using OAI Privacy Filter
12
  ---
13
 
14
+ # Paste-Proxy
15
+
16
+ A paste-to-share service with OpenAI Privacy Filter wired into the critical path. Authors paste sensitive text and get two URLs:
17
+
18
+ - a **public view link** that serves the OPF-redacted version (placeholders like `<PRIVATE_PERSON>` instead of the original PII);
19
+ - a **private reveal link** (guarded by an unguessable token) that serves the original to the author or anyone they deliberately share the reveal URL with.
20
+
21
+ ## Why this exists
22
+
23
+ It's a demo of a pattern that doesn't fit `gr.Blocks` cleanly. `gr.Blocks` maps one event to one function call on a given session. A pastebin needs:
24
+
25
+ 1. **Persistent server-side state keyed by a short URL** β€” the paste must outlive any single session and be reachable by anyone at `/view/{id}`.
26
+ 2. **Two distinct GET routes for the same resource** β€” one public, one token-gated β€” served as real HTML pages (not component updates).
27
+ 3. **A background cleanup task** independent of any request, that sweeps pastes that have passed their TTL.
28
+
29
+ **`gr.Server` gives you a FastAPI app you decorate with the usual `@server.get` / `@server.post` while still getting Gradio API endpoints for the `gradio_client` SDK. Perfect fit.**
30
+
31
+ ## Routes
32
+
33
+ | Method | Path | Purpose |
34
+ |--------|-----------------------------|--------------------------------------------------------------|
35
+ | GET | `/` | Compose page (paste editor) |
36
+ | POST | `/api/paste` | Scan with OPF, mint `{id, reveal_token}`, store |
37
+ | GET | `/view/{id}` | Public redacted HTML view |
38
+ | GET | `/view/{id}?token=...` | Author's reveal HTML view (original with PII highlighted) |
39
+ | GET | `/api/paste/{id}` | JSON: redacted + stats |
40
+ | GET | `/api/paste/{id}?token=...` | JSON: redacted + stats + original + spans (token-gated) |
41
+ | β€” | `analyze_paste` (gr API) | Programmatic paste creation for `gradio_client` |
42
+
43
+ Reveal tokens are 22 bytes from `secrets.token_urlsafe` and compared with `secrets.compare_digest`.
44
+
45
+ ## Auto-expiry
46
+
47
+ The compose form offers **never / 1h / 24h / 7d**. A background daemon thread wakes every 30s and evicts expired pastes. Expired links 404.
48
+
49
+ ## Storage
50
+
51
+ Pastes live in a process-local dict (`PASTES: dict[str, Paste]`) guarded by a lock. That's deliberate for a public demo β€” it makes the point of the architecture without coupling to Redis or a DB. When the Space restarts, pastes are wiped; the UI surfaces that on the 404 page.
52
+
53
+ To make this production-grade you'd swap `_store_put/_store_get` for a Redis client (both operations are single-key writes/reads) and turn the sweeper into a `ZADD` on expiry time with a range-delete loop.
54
+
55
+ ## OPF model
56
+
57
+ Uses OPENAI's Privacy Filter model (1.5B params, 50M active, 128k context) loaded from safetensors with the exact architecture + Viterbi-decoder pipeline.
58
+
59
+ PII Categories handled: Person, Address, Email, Phone, URL, Date, Account, Secret. Redaction replaces each detected span with a `<CATEGORY>` placeholder (matching the format in `[Confidential, non-final draft] Redaction examples.pdf`).
60
+
61
+ Inference runs behind `@spaces.GPU` so the model only pins a GPU slot during an actual paste scan (**ZEROGPU**)
62
+
63
+ ## Running locally
64
+
65
+ ```bash
66
+ export HF_TOKEN=... # if the model repo is gated
67
+ pip install -r requirements.txt
68
+ python app.py # serves on :7860
69
+ ```
70
+
71
+ ## Programmatic API
72
+
73
+ ```python
74
+ from gradio_client import Client
75
+ c = Client("YOUR_SPACE_ID")
76
+ resp = c.predict("Call me at 415-555-0123", ttl="1h", api_name="/analyze_paste")
77
+ # resp is a JSON string with id, reveal_token, view_path, reveal_path, stats
78
+ ```
79
+ Goto the link `https://ysharma-dummy-opf-3.hf.space/view/<VIEW_PATH>` for filtered/redacted version and to the link `https://ysharma-dummy-opf-3.hf.space/view/<REVEAL-PATH>` for the _author_ view.