metadata
title: Unified Document Extraction API
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
app_file: app.py
pinned: false
π Unified Document Extraction API
One API, Two Engines: Docling + DocStrange
Extract structured data from any document using AI-powered engines.
Features
- β Docling - Advanced document parsing with structure preservation
- β DocStrange - GPU-accelerated intelligent document processing
- β Multiple formats - PDF, DOCX, XLSX, PPTX, Images, and more
- β Structured output - Markdown, JSON, Tables
API Endpoints
GET /- Health checkGET /engines- List available enginesPOST /convert- Full document conversionPOST /convert/markdown- Markdown onlyPOST /convert/tables- Tables only
Usage
# Convert with Docling
curl -X POST "https://YOUR_SPACE.hf.space/convert?engine=docling" \
-F "file=@document.pdf"
# Convert with DocStrange
curl -X POST "https://YOUR_SPACE.hf.space/convert?engine=docstrange" \
-F "file=@document.pdf"
Integration
Works with DataSync application for ERPNext integration.
License
MIT