Spaces:
Running
Backend ↔ Frontend Analysis
A full walk of backend/ with every route and its inputs/outputs, mapped
against what the React frontend actually calls. Covers what works today,
what's partially wired, and what to fix before hosting the backend.
1. Backend surface
1.1 Entry points
| File | Purpose |
|---|---|
backend/app.py |
Flask app factory — registers blueprints, serves built SPA, SPA-fallback router. |
backend/start.py |
Dev startup script — checks deps, config, Playwright, then runs app.run(debug=True, port=5000). |
backend/config/config.example.py |
Template config. Must be copied to config.py with real values before running. |
1.2 Runtime deps (requirements.txt)
- Flask 3 +
flask-cors(loaded lazily; app still runs without it) - Playwright 1.40 + Chromium (required —
playwright install chromium) - Pillow, PyMuPDF (PDF page rasterization), requests
openai>=1.35(used as a generic OpenAI-compatible client for chat + vision)python-pptx+pywin32— Windows-only, for the PowerPoint video path
1.3 Required config (config/config.py)
The backend will not boot without these:
API_KEY,API_URL— any OpenAI-compatible chat/completions endpoint (Groq, OpenAI, NVIDIA NIM, local llama.cpp, …)MODELS_CONFIG["default" | "fast" | "quality"]— each entry needsmodel,temperature,top_p,max_tokens,api_keyMODEL_VISION,MODEL_VISION_FALLBACK— vision model for the Image/PDF tool
Everything else (OUTPUT_FOLDER, HTML_FOLDER, DEFAULT_*, PowerPoint paths) has sensible defaults.
2. Routes — inputs and outputs
2.1 generate.py (text → screenshots)
| Method | Path | Body | Response |
|---|---|---|---|
| POST | /generate |
{ text, zoom?, overlap?, viewport_width?, viewport_height?, max_screenshots?, use_cache?, beautify_html?, enable_verification?, model_choice?, screenshot_folder?, html_folder? } |
{ success, html_filename, html_content, screenshot_files[], screenshot_count, screenshot_folder, estimated_total_seconds, metrics, performance } |
| POST | /generate-sse |
same body | SSE stream: started → progress (ai / ai_verify / ai_revision / html_saved / screenshots_done) → complete / error / cancelled |
| POST | /cancel/<operation_id> |
empty | { success, message } or 404 |
| POST | /preview |
{ text, use_cache?, beautify?, model_choice? } |
{ success, html_content } (no screenshots) |
Notes:
textlimit: ~100k tokens (rejected otherwise).- Settings:
zoomdefault 2.1,overlap15 px, viewport 1920×1080,max_screenshots50,use_cachetrue,beautify_htmlfalse,enable_verificationtrue. operation_idcomes from thestartedSSE event — frontend uses it for/cancel/<id>.
2.2 html_routes.py (HTML → screenshots)
| Method | Path | Body | Response |
|---|---|---|---|
| POST | /generate-html |
{ html, zoom?, overlap?, viewport_width?, viewport_height?, max_screenshots? } |
{ success, html_filename, screenshot_files[], screenshot_count, screenshot_folder } |
| POST | /beautify |
{ html } |
{ success, html, validation } |
| POST | /minify |
{ html } |
{ success, html, original_size, minified_size, reduction_percent } |
Notes:
- No SSE variant — this path is synchronous.
- No cancel, no cache, no verification loop — it just renders the HTML as-is.
2.3 image_routes.py (image/PDF → screenshots)
| Method | Path | Body | Response |
|---|---|---|---|
| POST | /extract-from-image |
multipart: image (file), instructions (text) |
{ success, raw_text, metadata: {image_count, character_count, word_count}, message } |
| POST | /image-to-screenshots-sse |
multipart: image (file), instructions, zoom, overlap, viewport_width, viewport_height, max_screenshots, system_prompt |
SSE: started → progress (vision → ai → html_saved → screenshots) → complete / error / cancelled |
Notes:
- PDFs: first 10 pages only (
fitz.open(...).load_page(i)capped at 10). completeevent wraps the payload inresult: {...}— see gap §4.1.- Temp files (
upload_*,page_*) are cleaned up after extraction.
2.4 resources.py (files, history, cache, metrics)
| Method | Path | Body / query | Response |
|---|---|---|---|
| GET | /screenshots/<path> |
path param | PNG bytes, 403 on traversal, 404 if missing |
| GET | /html/<path> |
path param | HTML text, 403 on traversal, 404 if missing |
| GET | /download/<path> |
path param | File as attachment (searches output/screenshots, output/videos, output/presentations) |
| GET | /list |
— | { screenshots[], html_files[] } |
| DELETE | /delete/<file_type>/<filename> |
path params — file_type ∈ screenshot / html |
{ success, message } |
| POST | /regenerate |
{ html_filename, zoom?, overlap?, viewport_width?, viewport_height?, max_screenshots? } |
{ success, screenshot_files[], screenshot_count, screenshot_folder } |
| POST | /download-zip |
{ files[], name? } |
ZIP attachment |
| GET | /history |
— | HistoryEntry[] (tool, input_preview, html_file, screenshot_folder, screenshot_count, settings, timestamp) |
| GET | /cache/stats |
— | { size, hits, misses, hit_rate_percent } |
| POST | /cache/clear |
— | { success, message } |
| GET | /metrics/<operation_id> |
— | { operation_id, duration, duration_seconds, duration_ms, status, start_time, end_time, metadata } |
Notes:
- Path-traversal hardened via
_safe_child()on/screenshots,/html, and/regenerate. /listflattensbatch N/foo.pngsubfolders into thescreenshots[]array.
3. What the frontend actually uses
From frontend/src/api/client.ts + hooks/useGenerate.ts:
| Endpoint | Used by | Notes |
|---|---|---|
/generate-sse |
Text→Video page | Fully wired (progress, cancel, ETA, completion). |
/generate-html |
HTML→Video page | Sync only — no SSE. |
/image-to-screenshots-sse |
Image→Video page | Bug §4.1 — complete event's payload doesn't populate. |
/cancel/<id> |
All three pages | Works. |
/beautify, /minify |
HTML page helpers | Works. |
/history, /list, /cache/stats, /cache/clear |
Processes page | Works. |
/screenshots/<path> |
<img> tags in galleries |
See §4.2 — URL-encoding gap for batch N/... names. |
/html/<name> |
"Open HTML" links | Works. |
Defined in client.ts but not called anywhere in the UI today:
api.generate— the non-SSE text path is never used.api.regenerate— no re-run-with-new-settings button.api.deleteFile— no delete button.api.downloadZip— no "download all as zip" button.
Backend endpoints the frontend has no client for at all:
/preview(HTML dry-run)./download/<path>(single-file download as attachment)./metrics/<operation_id>(live perf inspection).
4. Gaps / fixes needed
4.1 Image→Video SSE complete event mismatch (blocker for that page)
image_routes.py sends:
{ "type": "complete", ..., "result": { "html_filename": "...", "screenshot_files": ["batch 3/foo.png"], "screenshot_folder": "batch 3" } }
But useGenerate.ts reads ev.html_filename, ev.screenshot_files, ev.screenshot_folder as flat fields. The end result: Image→Video completes successfully server-side, but the gallery and result panel render empty.
Fix options:
- Backend: flatten the payload (
{"type":"complete","html_filename":...,"screenshot_files":...,...}) to matchgenerate-sse. - Or frontend: unwrap
ev.resultwhen present.
Flattening the backend is the safer fix — generate-sse already does it this way.
4.2 Screenshot URL encoding
api.screenshotUrl('batch 3/foo.png') produces /screenshots/batch 3/foo.png. Browsers percent-encode the space but some servers / proxies won't, and our Flask route reads <path:filename> raw. In practice it works for Playwright-rendered names but should be:
screenshotUrl: (filename: string) =>
buildUrl('/screenshots/' + filename.split('/').map(encodeURIComponent).join('/'))
4.3 History "tool" label mismatch
Backend writes "tool": "text-to-image" / "html-to-image" / "image-to-screenshots" / "regenerate".
Client-side runs store uses "text-to-video" / "html-to-video" / "image-to-video".
Deduplication in Processes.tsx is by html_filename, so it mostly works — but the filter chips won't match backend history. Normalize both sides.
4.4 Missing UI hooks for existing backend features
All implemented server-side, no button yet:
- Regenerate — "render again with new zoom / viewport" from a history row.
- Download ZIP — batch-download all screenshots from a run.
- Delete — remove a screenshot / HTML from disk.
- Preview — see AI-generated HTML without rendering screenshots (big cost saver).
- Live metrics —
/metrics/<operation_id>while a run is in flight.
These would slot naturally into each row on the Processes page.
4.5 CORS + hosting
flask-corsis loaded withtry/except ImportError. Fine for local, but for any remote host you should pinoriginsto the actual frontend domain instead of"*".- No auth. Exposing port 5000 beyond
localhostshould at least require a shared token header — the backend currently has zero access controls. - No rate limiting. The AI endpoints are expensive; add
Flask-Limiteror front with a reverse proxy that enforces quotas.
4.6 Hosting-readiness summary
What works out of the box (local):
python start.pyfrombackend/boots the whole app — serves built React at/and API everywhere else.- Dev mode (
npm run dev+ Vite proxy) works without CORS. - Single-port single-process deploy:
gunicorn -b 0.0.0.0:5000 app:appon Linux, native Flask on Windows.
Must fix before remote hosting:
- Set
DEBUG=Falseinconfig.py(currentlyTrue). - Install
flask-corsand pinorigins. - Don't use
app.run()in prod — usegunicorn(already in requirements, Linux-only). - Add at least a shared API-key header or basic-auth wrapper on every route.
- Persist
output/on a mounted volume if the container is stateless. - Playwright needs Chromium installed in the image — use
mcr.microsoft.com/playwright/python:v1.40.0as the base. - Windows-only
pywin32/python-pptxshould be gated — they already are inrequirements.txt, good.
Nice to have before shipping:
- Pin token budgets and request size limits per endpoint.
- Structured logging (right now every route
print()s to stdout with emojis). /healthzendpoint for container probes.- Gunicorn config:
--workers 2 --threads 4 --timeout 600(screenshot runs are long).
5. One-page TL;DR
- ✅ Text→Video: backend + frontend fully integrated.
- ✅ HTML→Video: backend + frontend fully integrated.
- ⚠ Image/PDF→Video: works server-side but the
completeevent wraps its payload, so the UI shows no screenshots on success — fix backend flatten OR frontend unwrap. - ✅ Processes page: reads
/history,/list,/cache/stats; unified client + backend view. - ❌ Regenerate, Delete, Download-ZIP, Preview, Metrics endpoints are all implemented server-side but unused in UI.
- ⚠ Screenshot path encoding + history tool-name normalization are low-severity polish.
- ❌ Backend is local-only today; to host remotely you need auth, CORS pinning, prod WSGI, and log sanitization.