Spaces:
Running
Running
File size: 11,635 Bytes
5f3e9f5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 | # Backend ↔ Frontend Analysis
A full walk of `backend/` with every route and its inputs/outputs, mapped
against what the React frontend actually calls. Covers what works today,
what's partially wired, and what to fix before hosting the backend.
---
## 1. Backend surface
### 1.1 Entry points
| File | Purpose |
|------|---------|
| `backend/app.py` | Flask app factory — registers blueprints, serves built SPA, SPA-fallback router. |
| `backend/start.py` | Dev startup script — checks deps, config, Playwright, then runs `app.run(debug=True, port=5000)`. |
| `backend/config/config.example.py` | Template config. Must be copied to `config.py` with real values before running. |
### 1.2 Runtime deps (`requirements.txt`)
- Flask 3 + `flask-cors` (loaded lazily; app still runs without it)
- Playwright 1.40 + Chromium (required — `playwright install chromium`)
- Pillow, PyMuPDF (PDF page rasterization), requests
- `openai>=1.35` (used as a generic OpenAI-compatible client for chat + vision)
- `python-pptx` + `pywin32` — Windows-only, for the PowerPoint video path
### 1.3 Required config (`config/config.py`)
The backend **will not boot** without these:
- `API_KEY`, `API_URL` — any OpenAI-compatible chat/completions endpoint (Groq, OpenAI, NVIDIA NIM, local llama.cpp, …)
- `MODELS_CONFIG["default" | "fast" | "quality"]` — each entry needs `model`, `temperature`, `top_p`, `max_tokens`, `api_key`
- `MODEL_VISION`, `MODEL_VISION_FALLBACK` — vision model for the Image/PDF tool
Everything else (`OUTPUT_FOLDER`, `HTML_FOLDER`, `DEFAULT_*`, PowerPoint paths) has sensible defaults.
---
## 2. Routes — inputs and outputs
### 2.1 `generate.py` (text → screenshots)
| Method | Path | Body | Response |
|--------|------|------|----------|
| POST | `/generate` | `{ text, zoom?, overlap?, viewport_width?, viewport_height?, max_screenshots?, use_cache?, beautify_html?, enable_verification?, model_choice?, screenshot_folder?, html_folder? }` | `{ success, html_filename, html_content, screenshot_files[], screenshot_count, screenshot_folder, estimated_total_seconds, metrics, performance }` |
| POST | `/generate-sse` | same body | SSE stream: `started` → `progress` (ai / ai_verify / ai_revision / html_saved / screenshots_done) → `complete` / `error` / `cancelled` |
| POST | `/cancel/<operation_id>` | empty | `{ success, message }` or 404 |
| POST | `/preview` | `{ text, use_cache?, beautify?, model_choice? }` | `{ success, html_content }` (no screenshots) |
Notes:
- `text` limit: ~100k tokens (rejected otherwise).
- Settings: `zoom` default 2.1, `overlap` 15 px, viewport 1920×1080, `max_screenshots` 50, `use_cache` true, `beautify_html` false, `enable_verification` true.
- `operation_id` comes from the `started` SSE event — frontend uses it for `/cancel/<id>`.
### 2.2 `html_routes.py` (HTML → screenshots)
| Method | Path | Body | Response |
|--------|------|------|----------|
| POST | `/generate-html` | `{ html, zoom?, overlap?, viewport_width?, viewport_height?, max_screenshots? }` | `{ success, html_filename, screenshot_files[], screenshot_count, screenshot_folder }` |
| POST | `/beautify` | `{ html }` | `{ success, html, validation }` |
| POST | `/minify` | `{ html }` | `{ success, html, original_size, minified_size, reduction_percent }` |
Notes:
- No SSE variant — this path is synchronous.
- No cancel, no cache, no verification loop — it just renders the HTML as-is.
### 2.3 `image_routes.py` (image/PDF → screenshots)
| Method | Path | Body | Response |
|--------|------|------|----------|
| POST | `/extract-from-image` | `multipart`: `image` (file), `instructions` (text) | `{ success, raw_text, metadata: {image_count, character_count, word_count}, message }` |
| POST | `/image-to-screenshots-sse` | `multipart`: `image` (file), `instructions`, `zoom`, `overlap`, `viewport_width`, `viewport_height`, `max_screenshots`, `system_prompt` | SSE: `started` → `progress` (vision → ai → html_saved → screenshots) → `complete` / `error` / `cancelled` |
Notes:
- PDFs: first 10 pages only (`fitz.open(...).load_page(i)` capped at 10).
- `complete` event wraps the payload in `result: {...}` — **see gap §4.1**.
- Temp files (`upload_*`, `page_*`) are cleaned up after extraction.
### 2.4 `resources.py` (files, history, cache, metrics)
| Method | Path | Body / query | Response |
|--------|------|--------------|----------|
| GET | `/screenshots/<path>` | path param | PNG bytes, 403 on traversal, 404 if missing |
| GET | `/html/<path>` | path param | HTML text, 403 on traversal, 404 if missing |
| GET | `/download/<path>` | path param | File as attachment (searches `output/screenshots`, `output/videos`, `output/presentations`) |
| GET | `/list` | — | `{ screenshots[], html_files[] }` |
| DELETE | `/delete/<file_type>/<filename>` | path params — `file_type` ∈ `screenshot` / `html` | `{ success, message }` |
| POST | `/regenerate` | `{ html_filename, zoom?, overlap?, viewport_width?, viewport_height?, max_screenshots? }` | `{ success, screenshot_files[], screenshot_count, screenshot_folder }` |
| POST | `/download-zip` | `{ files[], name? }` | ZIP attachment |
| GET | `/history` | — | `HistoryEntry[]` (tool, input_preview, html_file, screenshot_folder, screenshot_count, settings, timestamp) |
| GET | `/cache/stats` | — | `{ size, hits, misses, hit_rate_percent }` |
| POST | `/cache/clear` | — | `{ success, message }` |
| GET | `/metrics/<operation_id>` | — | `{ operation_id, duration, duration_seconds, duration_ms, status, start_time, end_time, metadata }` |
Notes:
- Path-traversal hardened via `_safe_child()` on `/screenshots`, `/html`, and `/regenerate`.
- `/list` flattens `batch N/foo.png` subfolders into the `screenshots[]` array.
---
## 3. What the frontend actually uses
From `frontend/src/api/client.ts` + `hooks/useGenerate.ts`:
| Endpoint | Used by | Notes |
|----------|---------|-------|
| `/generate-sse` | Text→Video page | Fully wired (progress, cancel, ETA, completion). |
| `/generate-html` | HTML→Video page | Sync only — no SSE. |
| `/image-to-screenshots-sse` | Image→Video page | **Bug §4.1** — `complete` event's payload doesn't populate. |
| `/cancel/<id>` | All three pages | Works. |
| `/beautify`, `/minify` | HTML page helpers | Works. |
| `/history`, `/list`, `/cache/stats`, `/cache/clear` | Processes page | Works. |
| `/screenshots/<path>` | `<img>` tags in galleries | **See §4.2** — URL-encoding gap for `batch N/...` names. |
| `/html/<name>` | "Open HTML" links | Works. |
**Defined in client.ts but not called anywhere in the UI today:**
- `api.generate` — the non-SSE text path is never used.
- `api.regenerate` — no re-run-with-new-settings button.
- `api.deleteFile` — no delete button.
- `api.downloadZip` — no "download all as zip" button.
**Backend endpoints the frontend has no client for at all:**
- `/preview` (HTML dry-run).
- `/download/<path>` (single-file download as attachment).
- `/metrics/<operation_id>` (live perf inspection).
---
## 4. Gaps / fixes needed
### 4.1 Image→Video SSE `complete` event mismatch *(blocker for that page)*
`image_routes.py` sends:
```json
{ "type": "complete", ..., "result": { "html_filename": "...", "screenshot_files": ["batch 3/foo.png"], "screenshot_folder": "batch 3" } }
```
But `useGenerate.ts` reads `ev.html_filename`, `ev.screenshot_files`, `ev.screenshot_folder` as flat fields. The end result: Image→Video completes successfully server-side, but the gallery and result panel render empty.
**Fix options:**
- Backend: flatten the payload (`{"type":"complete","html_filename":...,"screenshot_files":...,...}`) to match `generate-sse`.
- Or frontend: unwrap `ev.result` when present.
Flattening the backend is the safer fix — `generate-sse` already does it this way.
### 4.2 Screenshot URL encoding
`api.screenshotUrl('batch 3/foo.png')` produces `/screenshots/batch 3/foo.png`. Browsers percent-encode the space but some servers / proxies won't, and our Flask route reads `<path:filename>` raw. In practice it works for Playwright-rendered names but should be:
```ts
screenshotUrl: (filename: string) =>
buildUrl('/screenshots/' + filename.split('/').map(encodeURIComponent).join('/'))
```
### 4.3 History "tool" label mismatch
Backend writes `"tool": "text-to-image"` / `"html-to-image"` / `"image-to-screenshots"` / `"regenerate"`.
Client-side runs store uses `"text-to-video"` / `"html-to-video"` / `"image-to-video"`.
Deduplication in `Processes.tsx` is by `html_filename`, so it mostly works — but the filter chips won't match backend history. Normalize both sides.
### 4.4 Missing UI hooks for existing backend features
All implemented server-side, no button yet:
- **Regenerate** — "render again with new zoom / viewport" from a history row.
- **Download ZIP** — batch-download all screenshots from a run.
- **Delete** — remove a screenshot / HTML from disk.
- **Preview** — see AI-generated HTML without rendering screenshots (big cost saver).
- **Live metrics** — `/metrics/<operation_id>` while a run is in flight.
These would slot naturally into each row on the Processes page.
### 4.5 CORS + hosting
- `flask-cors` is loaded with `try/except ImportError`. Fine for local, but for any remote host you should pin `origins` to the actual frontend domain instead of `"*"`.
- No auth. Exposing port 5000 beyond `localhost` should at least require a shared token header — the backend currently has zero access controls.
- No rate limiting. The AI endpoints are expensive; add `Flask-Limiter` or front with a reverse proxy that enforces quotas.
### 4.6 Hosting-readiness summary
**What works out of the box (local):**
- `python start.py` from `backend/` boots the whole app — serves built React at `/` and API everywhere else.
- Dev mode (`npm run dev` + Vite proxy) works without CORS.
- Single-port single-process deploy: `gunicorn -b 0.0.0.0:5000 app:app` on Linux, native Flask on Windows.
**Must fix before remote hosting:**
- Set `DEBUG=False` in `config.py` (currently `True`).
- Install `flask-cors` and pin `origins`.
- Don't use `app.run()` in prod — use `gunicorn` (already in requirements, Linux-only).
- Add at least a shared API-key header or basic-auth wrapper on every route.
- Persist `output/` on a mounted volume if the container is stateless.
- Playwright needs Chromium installed in the image — use `mcr.microsoft.com/playwright/python:v1.40.0` as the base.
- Windows-only `pywin32`/`python-pptx` should be gated — they already are in `requirements.txt`, good.
**Nice to have before shipping:**
- Pin token budgets and request size limits per endpoint.
- Structured logging (right now every route `print()`s to stdout with emojis).
- `/healthz` endpoint for container probes.
- Gunicorn config: `--workers 2 --threads 4 --timeout 600` (screenshot runs are long).
---
## 5. One-page TL;DR
- ✅ Text→Video: backend + frontend fully integrated.
- ✅ HTML→Video: backend + frontend fully integrated.
- ⚠ Image/PDF→Video: works server-side but the `complete` event wraps its payload, so the UI shows no screenshots on success — **fix backend flatten OR frontend unwrap**.
- ✅ Processes page: reads `/history`, `/list`, `/cache/stats`; unified client + backend view.
- ❌ Regenerate, Delete, Download-ZIP, Preview, Metrics endpoints are all implemented server-side but unused in UI.
- ⚠ Screenshot path encoding + history tool-name normalization are low-severity polish.
- ❌ Backend is local-only today; to host remotely you need auth, CORS pinning, prod WSGI, and log sanitization.
|