docling-processor / QUICKSTART.md
arjunbhargav212's picture
Upload 12 files
ad5d213 verified

⚑ Quick Start Guide - Hugging Face Deployment

🎯 5-Minute Setup

Step 1: Create HF Spaces (2 min)

  1. Go to https://huggingface.co/spaces
  2. Create TWO spaces:
    • docling-api
    • docstrange-api
  3. Use Docker SDK for both
  4. Set to Public (free) or Private

Step 2: Upload Files (1 min)

For EACH space:

  1. Upload app.py from corresponding folder
  2. Upload requirements.txt from corresponding folder
  3. Wait for deployment (2-3 min)

Step 3: Get Your URLs

After deployment:

  • Docling: https://YOUR_USERNAME-docling-api.hf.space
  • DocStrange: https://YOUR_USERNAME-docstrange-api.hf.space

Step 4: Connect to DataSync (1 min)

  1. Open http://localhost:5000
  2. Go to Import Data β†’ DocStrange tab
  3. Select engine:
    • πŸ”¬ Docling Hugging Face OR
    • πŸ§ͺ DocStrange Hugging Face
  4. Paste your HF URL
  5. Upload PDF and extract!

πŸ§ͺ Test Your APIs

# Test both APIs
cd huggingface_deploy\test-scripts

python test_docling.py https://YOUR_USERNAME-docling-api.hf.space
python test_docstrange.py https://YOUR_USERNAME-docstrange-api.hf.space

βœ… You're Done!

Both APIs are now integrated with DataSync and ready to extract documents!


πŸ†˜ Troubleshooting

Problem Solution
Space not deploying Check Docker logs in HF Space settings
API returns 500 Verify requirements.txt uploaded
Timeout errors PDF too large - try smaller file
Not working in DataSync Check URL format (no trailing slash)

πŸ“š Next Steps

  • Try different engines for comparison
  • Map extracted columns to ERPNext
  • Download CSV/JSON of extracted data

Happy extracting! πŸš€