Spaces:

vn6295337
/

Enterprise-AI-Gateway

Sleeping

vn6295337 Claude Opus 4.5 commited on Feb 13

Commit

2f094f6

1 Parent(s): a55afbf

Skip Lakera check for educational content to avoid false positives

Educational questions like "What causes people to hate those who
are different?" were being blocked by Lakera. Now:
- If regex layer marks content as educational, skip Lakera check
- This prevents false positives on legitimate questions about
prejudice, discrimination, civil rights, etc.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (1) hide show

src/api/routes.py +12 -8

src/api/routes.py CHANGED Viewed

@@ -90,14 +90,18 @@ async def query_llm(request: Request, query: QueryRequest, api_key: str = Depend
     # ========== LAYER 2: AI-based safety check (Lakera/Gemini) ==========
     # Only runs if regex layer passes
-    toxicity_result = detect_toxicity(query.prompt)
-    if toxicity_result["is_toxic"]:
-        categories = ", ".join(toxicity_result["blocked_categories"]) or "harmful content"
-        metrics.record_request(blocked=True)
-        raise HTTPException(
-            status_code=status.HTTP_400_BAD_REQUEST,
-            detail=f"Security Alert: Content flagged by AI safety ({categories})"
-        )
     # ========== LAYER 3: LLM Execution ==========
     response_content, provider_used, latency_ms, error_message, cascade_path = await llm_client.query_llm_cascade(

     # ========== LAYER 2: AI-based safety check (Lakera/Gemini) ==========
     # Only runs if regex layer passes
+    # Skip for educational content (to avoid false positives on questions about hate/prejudice)
+    is_educational = hate_result.get("is_educational", False)
+    if not is_educational:
+        toxicity_result = detect_toxicity(query.prompt)
+        if toxicity_result["is_toxic"]:
+            categories = ", ".join(toxicity_result["blocked_categories"]) or "harmful content"
+            metrics.record_request(blocked=True)
+            raise HTTPException(
+                status_code=status.HTTP_400_BAD_REQUEST,
+                detail=f"Security Alert: Content flagged by AI safety ({categories})"
+            )
     # ========== LAYER 3: LLM Execution ==========
     response_content, provider_used, latency_ms, error_message, cascade_path = await llm_client.query_llm_cascade(