docling-processor / README.md
arjunbhargav212's picture
Upload 4 files
dc23f92 verified
metadata
title: Unified Document Extraction API
emoji: πŸ“„
colorFrom: blue
colorTo: indigo
sdk: docker
app_file: app.py
pinned: false

πŸš€ Unified Document Extraction API

One API, Two Engines: Docling + DocStrange

Extract structured data from any document using AI-powered engines.

Features

  • βœ… Docling - Advanced document parsing with structure preservation
  • βœ… DocStrange - GPU-accelerated intelligent document processing
  • βœ… Multiple formats - PDF, DOCX, XLSX, PPTX, Images, and more
  • βœ… Structured output - Markdown, JSON, Tables

API Endpoints

  • GET / - Health check
  • GET /engines - List available engines
  • POST /convert - Full document conversion
  • POST /convert/markdown - Markdown only
  • POST /convert/tables - Tables only

Usage

# Convert with Docling
curl -X POST "https://YOUR_SPACE.hf.space/convert?engine=docling" \
  -F "file=@document.pdf"

# Convert with DocStrange
curl -X POST "https://YOUR_SPACE.hf.space/convert?engine=docstrange" \
  -F "file=@document.pdf"

Integration

Works with DataSync application for ERPNext integration.

License

MIT