setu / docs /HITL_QUICK_REFERENCE.md
khagu's picture
chore: finally untrack large database files
3998131

HITL Bias Detection - Quick Reference

API Endpoints Summary

Endpoint Method Purpose Auth Required
/api/v1/bias-detection-hitl/start-review POST Upload PDF & start review βœ“
/api/v1/bias-detection-hitl/approve-suggestion POST Approve/reject suggestion βœ“
/api/v1/bias-detection-hitl/regenerate-suggestion POST Get new suggestion βœ“
/api/v1/bias-detection-hitl/session/{session_id} GET Get session status βœ“
/api/v1/bias-detection-hitl/generate-pdf POST Generate final PDF βœ“
/api/v1/bias-detection-hitl/health GET Health check βœ—

Sentence Status Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ pending  β”‚ ◄─── New biased sentence with suggestion
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
     β”‚
     β”œβ”€β”€β–Ί approve ────┐
     β”‚                β”‚
     └──► reject      β”‚
            ↓         β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚needs_regenerationβ”‚
     β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β”œβ”€β”€β–Ί regenerate ──► pending (with new suggestion)
          β”‚
          └──► give up ──► manual intervention needed
                            β”‚
                            ↓
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚ approved β”‚ ◄─── All approved? β†’ Generate PDF
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Quick Start (cURL)

# 1. Start review
SESSION_ID=$(curl -X POST "http://localhost:8000/api/v1/bias-detection-hitl/start-review" \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@document.pdf" | jq -r '.session_id')

# 2. Get session details
curl -X GET "http://localhost:8000/api/v1/bias-detection-hitl/session/$SESSION_ID" \
  -H "Authorization: Bearer $TOKEN" | jq .

# 3. Approve a suggestion
curl -X POST "http://localhost:8000/api/v1/bias-detection-hitl/approve-suggestion" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"session_id\":\"$SESSION_ID\",\"sentence_id\":\"SENTENCE_ID\",\"action\":\"approve\",\"approved_suggestion\":\"approved text here\"}"

# 4. Generate PDF
curl -X POST "http://localhost:8000/api/v1/bias-detection-hitl/generate-pdf" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"session_id\":\"$SESSION_ID\"}" \
  --output debiased.pdf

Key Response Fields

StartReviewResponse

{
  session_id: "uuid",          // Save this!
  total_sentences: 25,
  biased_count: 5,             // Needs review
  neutral_count: 20,           // Auto-approved
  sentences: [...]             // Array of BiasReviewItem
}

BiasReviewItem

{
  sentence_id: "uuid",         // Use for approval/rejection
  original_sentence: "...",    // Original text
  is_biased: true,            // true or false
  category: "gender",          // Bias type
  confidence: 0.92,            // 0.0-1.0
  suggestion: "...",           // LLM suggestion
  approved_suggestion: null,   // Filled after approval
  status: "pending"            // pending/approved/needs_regeneration
}

SessionStatusResponse

{
  status: "in_progress",       // pending_review/in_progress/completed
  pending_count: 2,            // Still needs review
  approved_count: 22,          // Ready
  needs_regeneration_count: 1  // Rejected, needs new suggestion
}

Common Patterns

Pattern 1: Simple Approval Loop

# Get session
session = start_review(pdf_file)

# Approve all suggestions
for sentence in session['sentences']:
    if sentence['is_biased']:
        approve_suggestion(
            session_id=session['session_id'],
            sentence_id=sentence['sentence_id'],
            approved_text=sentence['suggestion']
        )

# Generate PDF
generate_pdf(session['session_id'])

Pattern 2: Reject & Regenerate

# Reject
approve_suggestion(
    session_id=SESSION_ID,
    sentence_id=SENTENCE_ID,
    action='reject'
)

# Regenerate
new_suggestion = regenerate_suggestion(
    session_id=SESSION_ID,
    sentence_id=SENTENCE_ID
)

# Review new suggestion, then approve
approve_suggestion(
    session_id=SESSION_ID,
    sentence_id=SENTENCE_ID,
    action='approve',
    approved_text=new_suggestion
)

Pattern 3: Check Progress Before PDF

status = get_session_status(SESSION_ID)

if status['pending_count'] == 0 and status['needs_regeneration_count'] == 0:
    pdf = generate_pdf(SESSION_ID)
else:
    print(f"Still need to review {status['pending_count']} sentences")

Error Codes

Code Error Solution
400 Not all sentences reviewed Review all pending/rejected sentences
404 Session not found Check session_id or start new session
404 Sentence not found Verify sentence_id from session data
500 PDF generation failed Check logs, may use fallback method
500 LLM unavailable Verify MISTRAL_API_KEY is set
503 Model not loaded Check bias detection model status

Testing Checklist

  • Upload PDF successfully
  • Session ID returned
  • Biased sentences have suggestions
  • Neutral sentences auto-approved
  • Can approve suggestion
  • Can reject suggestion
  • Can regenerate after rejection
  • Can get session status
  • PDF generation blocked if not all approved
  • PDF generation succeeds when all approved
  • Downloaded PDF contains changes

Files Structure

api/
β”œβ”€β”€ routes/
β”‚   β”œβ”€β”€ bias_detection.py          # Original bias detection
β”‚   └── bias_detection_hitl.py     # New HITL endpoints
β”œβ”€β”€ schemas.py                      # Data models (updated)
└── main.py                         # Router registration (updated)

utility/
β”œβ”€β”€ pdf_processor.py                # PDF extraction (existing)
β”œβ”€β”€ hitl_session_manager.py         # Session management (new)
└── pdf_regenerator.py              # PDF regeneration (new)

docs/
β”œβ”€β”€ HITL_BIAS_DETECTION_GUIDE.md   # Full documentation
└── HITL_QUICK_REFERENCE.md        # This file

Environment Variables

# Required
MISTRAL_API_KEY=your_mistral_api_key_here

# Optional (already configured)
SUPABASE_URL=your_supabase_url
SUPABASE_KEY=your_supabase_key

Deployment Notes

Before Deploying:

  1. Set MISTRAL_API_KEY environment variable
  2. Test with sample PDFs
  3. Consider adding rate limiting
  4. Monitor memory usage
  5. Plan session cleanup strategy

Production Considerations:

  • Add session TTL (time-to-live)
  • Implement session cleanup job
  • Add file size limits
  • Rate limit LLM API calls
  • Migrate to persistent storage (Supabase/Redis)
  • Add logging and monitoring
  • Implement proper error tracking

Support Resources

  • Full Guide: docs/HITL_BIAS_DETECTION_GUIDE.md
  • Health Check: GET /api/v1/bias-detection-hitl/health
  • Original Bias Detection: GET /api/v1/health
  • API Docs: http://localhost:8000/docs (when server running)