gni commited on
Commit
5b17aa6
·
1 Parent(s): 0b4a37d

feat: update rate limit to 1/2s, custom error message, and SEO PNG placeholder

Browse files
Files changed (3) hide show
  1. README.md +65 -73
  2. api/main.py +16 -3
  3. ui/index.html +4 -4
README.md CHANGED
@@ -19,15 +19,6 @@ tags:
19
 
20
  A lightweight PII (Personally Identifiable Information) moderation MVP designed to sanitize sensitive data before it reaches LLM APIs.
21
 
22
- ## 🚀 Key Features
23
-
24
- - **Multi-Language Support**: High-accuracy detection for **English** and **French** using `spaCy` Large models.
25
- - **Double-Pass Protection**: Combines NLP-based detection with expert Regex patterns for PII coverage.
26
- - **Expert French Recognizers**: Built-in support for French-specific data: **SIRET**, **NIR** (Social Security), **IBAN**, and addresses.
27
- - **Balanced Anonymization**: Preserves job titles and document structure to keep texts readable.
28
- - **Minimal Dashboard**: React-based UI with Risk Assessment visualization.
29
- - **Custom Theme UI**: Switch between **Premium**, **Minimal Light**, and **Deep Midnight** modes.
30
-
31
  ---
32
 
33
  ## 🎥 Demo
@@ -43,85 +34,86 @@ Check out Redac in action:
43
 
44
  ---
45
 
46
- ## 🛠️ Architecture
47
-
48
- 1. **Core API (`/api`)**: FastAPI server powered by **Microsoft Presidio**.
49
- 2. **Web Dashboard (`/ui`)**: React + Vite + Tailwind CSS.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  ---
52
 
53
- ## 📦 Getting Started (Docker)
54
-
55
- ### Quick Start with Makefile
56
- If you have `make` installed, you can use these shortcuts:
57
- ```bash
58
- make build # Build containers
59
- make up # Start Redac
60
- make logs # View logs
61
- make test # Run tests
62
- ```
63
-
64
- ### Manual Docker commands
65
- ```bash
66
- docker compose up --build
67
- ```
68
 
69
- - **API**: `http://localhost:8000/api`
70
- - **UI Dashboard**: `http://localhost:5173`
 
 
 
71
 
72
  ---
73
 
74
- ## 🔌 API Usage
75
-
76
- You can interact directly with the Redac API using `curl`.
77
-
78
- ### Redact Text
79
- **Request:**
80
- ```bash
81
- curl -X POST http://localhost:8000/api/redact \
82
- -H "Content-Type: application/json" \
83
- -d '{"text": "My name is John Doe, call me at 06 12 34 56 78."}'
84
- ```
85
 
86
- **Response (JSON):**
87
- ```json
88
- {
89
- "original_text": "My name is John Doe, call me at 06 12 34 56 78.",
90
- "redacted_text": "My name is <PERSON>, call me at <PHONE_NUMBER>.",
91
- "detected_language": "en",
92
- "entities": [
93
- {
94
- "type": "PERSON",
95
- "text": "John Doe",
96
- "score": 85,
97
- "start": 11,
98
- "end": 19
99
- },
100
- {
101
- "type": "PHONE_NUMBER",
102
- "text": "06 12 34 56 78",
103
- "score": 95,
104
- "start": 32,
105
- "end": 46
106
- }
107
- ]
108
- }
109
- ```
110
 
111
  ---
112
 
113
- ## 🧪 Quality Assurance
114
 
115
- ### Using Makefile
116
  ```bash
117
- make test
118
  ```
119
 
120
- ### Manual Docker
121
- ```bash
122
- # Run the test suite
123
- docker compose run api python tests/verify_all.py
124
- ```
125
 
126
  ---
127
 
 
19
 
20
  A lightweight PII (Personally Identifiable Information) moderation MVP designed to sanitize sensitive data before it reaches LLM APIs.
21
 
 
 
 
 
 
 
 
 
 
22
  ---
23
 
24
  ## 🎥 Demo
 
34
 
35
  ---
36
 
37
+ ## 📖 API Documentation
38
+
39
+ The Redac API is open and can be integrated into your own workflows.
40
+
41
+ ### 🏠 Base URL
42
+ `https://lbl-redaction.hf.space/api`
43
+
44
+ ### ⚡ Rate Limiting
45
+ - **1 request every 2 seconds** per IP address to ensure stability.
46
+ - Exceeding this limit will return a `429 Too Many Requests` status code with a helpful message.
47
+
48
+ ### 🔍 Endpoints
49
+
50
+ #### 1. Redact Text
51
+ Processes a text and returns the anonymized version along with metadata about detected entities.
52
+
53
+ - **URL**: `/redact`
54
+ - **Method**: `POST`
55
+ - **Headers**: `Content-Type: application/json`
56
+ - **Body**:
57
+ ```json
58
+ {
59
+ "text": "Your sensitive text here",
60
+ "language": "auto"
61
+ }
62
+ ```
63
+ *(Options for language: `auto`, `en`, `fr`)*
64
+
65
+ - **Success Response (200 OK)**:
66
+ ```json
67
+ {
68
+ "original_text": "...",
69
+ "redacted_text": "My name is <PERSON>",
70
+ "detected_language": "en",
71
+ "entities": [
72
+ {
73
+ "type": "PERSON",
74
+ "text": "John Doe",
75
+ "score": 95,
76
+ "start": 11,
77
+ "end": 19
78
+ }
79
+ ]
80
+ }
81
+ ```
82
+
83
+ #### 2. System Status
84
+ Checks if the API and NLP engines are online.
85
+
86
+ - **URL**: `/status`
87
+ - **Method**: `GET`
88
 
89
  ---
90
 
91
+ ## 🚀 Key Features
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
 
93
+ - **Multi-Language Support**: High-accuracy detection for **English** and **French** using `spaCy` Large models.
94
+ - **Double-Pass Protection**: Combines NLP-based detection with expert Regex patterns for PII coverage.
95
+ - **Expert French Recognizers**: Built-in support for French-specific data: **SIRET**, **NIR**, **IBAN**, and addresses.
96
+ - **Balanced Anonymization**: Preserves job titles and document structure to keep texts readable.
97
+ - **Minimal Dashboard**: React-based UI with Risk Assessment visualization.
98
 
99
  ---
100
 
101
+ ## 🛠️ Architecture
 
 
 
 
 
 
 
 
 
 
102
 
103
+ 1. **Core API (`/api`)**: FastAPI server powered by **Microsoft Presidio**.
104
+ 2. **Web Dashboard (`/ui`)**: React + Vite + Tailwind CSS.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
 
106
  ---
107
 
108
+ ## 📦 Local Development
109
 
110
+ ### Manual Docker commands
111
  ```bash
112
+ docker compose up --build
113
  ```
114
 
115
+ - **API**: `http://localhost:8000/api`
116
+ - **UI Dashboard**: `http://localhost:5173`
 
 
 
117
 
118
  ---
119
 
api/main.py CHANGED
@@ -1,7 +1,7 @@
1
- from fastapi import FastAPI, HTTPException
2
  from fastapi.middleware.cors import CORSMiddleware
3
  from fastapi.staticfiles import StaticFiles
4
- from fastapi.responses import FileResponse
5
  from pydantic import BaseModel
6
  from typing import List, Dict, Optional
7
  import logging
@@ -14,6 +14,9 @@ from presidio_analyzer.nlp_engine import NlpEngineProvider
14
  from presidio_anonymizer import AnonymizerEngine
15
  from langdetect import detect, DetectorFactory
16
  import uvicorn
 
 
 
17
 
18
  # Setup logging
19
  logging.basicConfig(level=logging.INFO)
@@ -21,8 +24,17 @@ logger = logging.getLogger(__name__)
21
 
22
  DetectorFactory.seed = 0
23
 
 
 
24
  app = FastAPI(title="Redac API")
 
25
 
 
 
 
 
 
 
26
 
27
  app.add_middleware(
28
  CORSMiddleware,
@@ -81,7 +93,8 @@ async def api_status():
81
  return {"status": "online", "mode": "pro-visual"}
82
 
83
  @app.post("/api/redact")
84
- async def redact_text(request: RedactRequest):
 
85
  # ... (rest of the logic)
86
  try:
87
  try:
 
1
+ from fastapi import FastAPI, HTTPException, Request
2
  from fastapi.middleware.cors import CORSMiddleware
3
  from fastapi.staticfiles import StaticFiles
4
+ from fastapi.responses import FileResponse, JSONResponse
5
  from pydantic import BaseModel
6
  from typing import List, Dict, Optional
7
  import logging
 
14
  from presidio_anonymizer import AnonymizerEngine
15
  from langdetect import detect, DetectorFactory
16
  import uvicorn
17
+ from slowapi import Limiter, _rate_limit_exceeded_handler
18
+ from slowapi.util import get_remote_address
19
+ from slowapi.errors import RateLimitExceeded
20
 
21
  # Setup logging
22
  logging.basicConfig(level=logging.INFO)
 
24
 
25
  DetectorFactory.seed = 0
26
 
27
+ # Setup rate limiting
28
+ limiter = Limiter(key_func=get_remote_address)
29
  app = FastAPI(title="Redac API")
30
+ app.state.limiter = limiter
31
 
32
+ @app.exception_handler(RateLimitExceeded)
33
+ async def custom_rate_limit_exceeded_handler(request: Request, exc: RateLimitExceeded):
34
+ return JSONResponse(
35
+ status_code=429,
36
+ content={"detail": "Too many requests. Please wait 2 seconds between each analysis to avoid saturating the server."}
37
+ )
38
 
39
  app.add_middleware(
40
  CORSMiddleware,
 
93
  return {"status": "online", "mode": "pro-visual"}
94
 
95
  @app.post("/api/redact")
96
+ @limiter.limit("1/2seconds")
97
+ async def redact_text(request: RedactRequest, request_raw: Request):
98
  # ... (rest of the logic)
99
  try:
100
  try:
ui/index.html CHANGED
@@ -19,14 +19,14 @@
19
  <meta property="og:url" content="https://lbl-redaction.hf.space" />
20
  <meta property="og:title" content="Redac | AI-Powered PII Data Redaction" />
21
  <meta property="og:description" content="Securely sanitize your documents. Automated PII detection and redaction for French and English texts." />
22
- <meta property="og:image" content="/src/assets/hero.png" />
23
 
24
  <!-- Twitter -->
25
- <meta property="twitter:card" content="summary_large_image" />
26
  <meta property="twitter:url" content="https://lbl-redaction.hf.space" />
27
- <meta property="twitter:title" content="Redac | AI-Powered PII Data Redaction" />
28
  <meta property="twitter:description" content="Protect sensitive data before sending to LLMs. Fast, secure, and automated PII redaction." />
29
- <meta property="twitter:image" content="/src/assets/hero.png" />
30
  </head>
31
  <body>
32
  <div id="root"></div>
 
19
  <meta property="og:url" content="https://lbl-redaction.hf.space" />
20
  <meta property="og:title" content="Redac | AI-Powered PII Data Redaction" />
21
  <meta property="og:description" content="Securely sanitize your documents. Automated PII detection and redaction for French and English texts." />
22
+ <meta property="og:image" content="/logo.png" />
23
 
24
  <!-- Twitter -->
25
+ <meta property="twitter:card" content="summary" />
26
  <meta property="twitter:url" content="https://lbl-redaction.hf.space" />
27
+ <meta property="twitter:title" content="Redac - AI PII Redaction Tool" />
28
  <meta property="twitter:description" content="Protect sensitive data before sending to LLMs. Fast, secure, and automated PII redaction." />
29
+ <meta property="twitter:image" content="/logo.png" />
30
  </head>
31
  <body>
32
  <div id="root"></div>