Spaces:

shiva0013
/

YT-AI-Automation

Running

App Files Files Community

YT-AI-Automation / docs /BACKEND_FRONTEND_ANALYSIS.md

github-actions

Sync Docker Space

5f3e9f5 3 days ago

preview code

raw

history blame contribute delete

11.6 kB

	# Backend ↔ Frontend Analysis

	A full walk of `backend/` with every route and its inputs/outputs, mapped
	against what the React frontend actually calls. Covers what works today,
	what's partially wired, and what to fix before hosting the backend.

	---

	## 1. Backend surface

	### 1.1 Entry points

	\| File \| Purpose \|
	\|------\|---------\|
	\| `backend/app.py` \| Flask app factory — registers blueprints, serves built SPA, SPA-fallback router. \|
	\| `backend/start.py` \| Dev startup script — checks deps, config, Playwright, then runs `app.run(debug=True, port=5000)`. \|
	\| `backend/config/config.example.py` \| Template config. Must be copied to `config.py` with real values before running. \|

	### 1.2 Runtime deps (`requirements.txt`)

	- Flask 3 + `flask-cors` (loaded lazily; app still runs without it)
	- Playwright 1.40 + Chromium (required — `playwright install chromium`)
	- Pillow, PyMuPDF (PDF page rasterization), requests
	- `openai>=1.35` (used as a generic OpenAI-compatible client for chat + vision)
	- `python-pptx` + `pywin32` — Windows-only, for the PowerPoint video path

	### 1.3 Required config (`config/config.py`)

	The backend will not boot without these:

	- `API_KEY`, `API_URL` — any OpenAI-compatible chat/completions endpoint (Groq, OpenAI, NVIDIA NIM, local llama.cpp, …)
	- `MODELS_CONFIG["default" \| "fast" \| "quality"]` — each entry needs `model`, `temperature`, `top_p`, `max_tokens`, `api_key`
	- `MODEL_VISION`, `MODEL_VISION_FALLBACK` — vision model for the Image/PDF tool

	Everything else (`OUTPUT_FOLDER`, `HTML_FOLDER`, `DEFAULT_*`, PowerPoint paths) has sensible defaults.

	---

	## 2. Routes — inputs and outputs

	### 2.1 `generate.py` (text → screenshots)

	\| Method \| Path \| Body \| Response \|
	\|--------\|------\|------\|----------\|
	\| POST \| `/generate` \| `{ text, zoom?, overlap?, viewport_width?, viewport_height?, max_screenshots?, use_cache?, beautify_html?, enable_verification?, model_choice?, screenshot_folder?, html_folder? }` \| `{ success, html_filename, html_content, screenshot_files[], screenshot_count, screenshot_folder, estimated_total_seconds, metrics, performance }` \|
	\| POST \| `/generate-sse` \| same body \| SSE stream: `started` → `progress` (ai / ai_verify / ai_revision / html_saved / screenshots_done) → `complete` / `error` / `cancelled` \|
	\| POST \| `/cancel/<operation_id>` \| empty \| `{ success, message }` or 404 \|
	\| POST \| `/preview` \| `{ text, use_cache?, beautify?, model_choice? }` \| `{ success, html_content }` (no screenshots) \|

	Notes:
	- `text` limit: ~100k tokens (rejected otherwise).
	- Settings: `zoom` default 2.1, `overlap` 15 px, viewport 1920×1080, `max_screenshots` 50, `use_cache` true, `beautify_html` false, `enable_verification` true.
	- `operation_id` comes from the `started` SSE event — frontend uses it for `/cancel/<id>`.

	### 2.2 `html_routes.py` (HTML → screenshots)

	\| Method \| Path \| Body \| Response \|
	\|--------\|------\|------\|----------\|
	\| POST \| `/generate-html` \| `{ html, zoom?, overlap?, viewport_width?, viewport_height?, max_screenshots? }` \| `{ success, html_filename, screenshot_files[], screenshot_count, screenshot_folder }` \|
	\| POST \| `/beautify` \| `{ html }` \| `{ success, html, validation }` \|
	\| POST \| `/minify` \| `{ html }` \| `{ success, html, original_size, minified_size, reduction_percent }` \|

	Notes:
	- No SSE variant — this path is synchronous.
	- No cancel, no cache, no verification loop — it just renders the HTML as-is.

	### 2.3 `image_routes.py` (image/PDF → screenshots)

	\| Method \| Path \| Body \| Response \|
	\|--------\|------\|------\|----------\|
	\| POST \| `/extract-from-image` \| `multipart`: `image` (file), `instructions` (text) \| `{ success, raw_text, metadata: {image_count, character_count, word_count}, message }` \|
	\| POST \| `/image-to-screenshots-sse` \| `multipart`: `image` (file), `instructions`, `zoom`, `overlap`, `viewport_width`, `viewport_height`, `max_screenshots`, `system_prompt` \| SSE: `started` → `progress` (vision → ai → html_saved → screenshots) → `complete` / `error` / `cancelled` \|

	Notes:
	- PDFs: first 10 pages only (`fitz.open(...).load_page(i)` capped at 10).
	- `complete` event wraps the payload in `result: {...}` — see gap §4.1.
	- Temp files (`upload_`, `page_`) are cleaned up after extraction.

	### 2.4 `resources.py` (files, history, cache, metrics)

	\| Method \| Path \| Body / query \| Response \|
	\|--------\|------\|--------------\|----------\|
	\| GET \| `/screenshots/<path>` \| path param \| PNG bytes, 403 on traversal, 404 if missing \|
	\| GET \| `/html/<path>` \| path param \| HTML text, 403 on traversal, 404 if missing \|
	\| GET \| `/download/<path>` \| path param \| File as attachment (searches `output/screenshots`, `output/videos`, `output/presentations`) \|
	\| GET \| `/list` \| — \| `{ screenshots[], html_files[] }` \|
	\| DELETE \| `/delete/<file_type>/<filename>` \| path params — `file_type` ∈ `screenshot` / `html` \| `{ success, message }` \|
	\| POST \| `/regenerate` \| `{ html_filename, zoom?, overlap?, viewport_width?, viewport_height?, max_screenshots? }` \| `{ success, screenshot_files[], screenshot_count, screenshot_folder }` \|
	\| POST \| `/download-zip` \| `{ files[], name? }` \| ZIP attachment \|
	\| GET \| `/history` \| — \| `HistoryEntry[]` (tool, input_preview, html_file, screenshot_folder, screenshot_count, settings, timestamp) \|
	\| GET \| `/cache/stats` \| — \| `{ size, hits, misses, hit_rate_percent }` \|
	\| POST \| `/cache/clear` \| — \| `{ success, message }` \|
	\| GET \| `/metrics/<operation_id>` \| — \| `{ operation_id, duration, duration_seconds, duration_ms, status, start_time, end_time, metadata }` \|

	Notes:
	- Path-traversal hardened via `_safe_child()` on `/screenshots`, `/html`, and `/regenerate`.
	- `/list` flattens `batch N/foo.png` subfolders into the `screenshots[]` array.

	---

	## 3. What the frontend actually uses

	From `frontend/src/api/client.ts` + `hooks/useGenerate.ts`:

	\| Endpoint \| Used by \| Notes \|
	\|----------\|---------\|-------\|
	\| `/generate-sse` \| Text→Video page \| Fully wired (progress, cancel, ETA, completion). \|
	\| `/generate-html` \| HTML→Video page \| Sync only — no SSE. \|
	\| `/image-to-screenshots-sse` \| Image→Video page \| Bug §4.1 — `complete` event's payload doesn't populate. \|
	\| `/cancel/<id>` \| All three pages \| Works. \|
	\| `/beautify`, `/minify` \| HTML page helpers \| Works. \|
	\| `/history`, `/list`, `/cache/stats`, `/cache/clear` \| Processes page \| Works. \|
	\| `/screenshots/<path>` \| `<img>` tags in galleries \| See §4.2 — URL-encoding gap for `batch N/...` names. \|
	\| `/html/<name>` \| "Open HTML" links \| Works. \|

	Defined in client.ts but not called anywhere in the UI today:

	- `api.generate` — the non-SSE text path is never used.
	- `api.regenerate` — no re-run-with-new-settings button.
	- `api.deleteFile` — no delete button.
	- `api.downloadZip` — no "download all as zip" button.

	Backend endpoints the frontend has no client for at all:

	- `/preview` (HTML dry-run).
	- `/download/<path>` (single-file download as attachment).
	- `/metrics/<operation_id>` (live perf inspection).

	---

	## 4. Gaps / fixes needed

	### 4.1 Image→Video SSE `complete` event mismatch (blocker for that page)

	`image_routes.py` sends:
	```json
	{ "type": "complete", ..., "result": { "html_filename": "...", "screenshot_files": ["batch 3/foo.png"], "screenshot_folder": "batch 3" } }
	```
	But `useGenerate.ts` reads `ev.html_filename`, `ev.screenshot_files`, `ev.screenshot_folder` as flat fields. The end result: Image→Video completes successfully server-side, but the gallery and result panel render empty.

	Fix options:
	- Backend: flatten the payload (`{"type":"complete","html_filename":...,"screenshot_files":...,...}`) to match `generate-sse`.
	- Or frontend: unwrap `ev.result` when present.

	Flattening the backend is the safer fix — `generate-sse` already does it this way.

	### 4.2 Screenshot URL encoding

	`api.screenshotUrl('batch 3/foo.png')` produces `/screenshots/batch 3/foo.png`. Browsers percent-encode the space but some servers / proxies won't, and our Flask route reads `<path:filename>` raw. In practice it works for Playwright-rendered names but should be:
	```ts
	screenshotUrl: (filename: string) =>
	buildUrl('/screenshots/' + filename.split('/').map(encodeURIComponent).join('/'))
	```

	### 4.3 History "tool" label mismatch

	Backend writes `"tool": "text-to-image"` / `"html-to-image"` / `"image-to-screenshots"` / `"regenerate"`.
	Client-side runs store uses `"text-to-video"` / `"html-to-video"` / `"image-to-video"`.
	Deduplication in `Processes.tsx` is by `html_filename`, so it mostly works — but the filter chips won't match backend history. Normalize both sides.

	### 4.4 Missing UI hooks for existing backend features

	All implemented server-side, no button yet:
	- Regenerate — "render again with new zoom / viewport" from a history row.
	- Download ZIP — batch-download all screenshots from a run.
	- Delete — remove a screenshot / HTML from disk.
	- Preview — see AI-generated HTML without rendering screenshots (big cost saver).
	- Live metrics — `/metrics/<operation_id>` while a run is in flight.

	These would slot naturally into each row on the Processes page.

	### 4.5 CORS + hosting

	- `flask-cors` is loaded with `try/except ImportError`. Fine for local, but for any remote host you should pin `origins` to the actual frontend domain instead of `"*"`.
	- No auth. Exposing port 5000 beyond `localhost` should at least require a shared token header — the backend currently has zero access controls.
	- No rate limiting. The AI endpoints are expensive; add `Flask-Limiter` or front with a reverse proxy that enforces quotas.

	### 4.6 Hosting-readiness summary

	What works out of the box (local):
	- `python start.py` from `backend/` boots the whole app — serves built React at `/` and API everywhere else.
	- Dev mode (`npm run dev` + Vite proxy) works without CORS.
	- Single-port single-process deploy: `gunicorn -b 0.0.0.0:5000 app:app` on Linux, native Flask on Windows.

	Must fix before remote hosting:
	- Set `DEBUG=False` in `config.py` (currently `True`).
	- Install `flask-cors` and pin `origins`.
	- Don't use `app.run()` in prod — use `gunicorn` (already in requirements, Linux-only).
	- Add at least a shared API-key header or basic-auth wrapper on every route.
	- Persist `output/` on a mounted volume if the container is stateless.
	- Playwright needs Chromium installed in the image — use `mcr.microsoft.com/playwright/python:v1.40.0` as the base.
	- Windows-only `pywin32`/`python-pptx` should be gated — they already are in `requirements.txt`, good.

	Nice to have before shipping:
	- Pin token budgets and request size limits per endpoint.
	- Structured logging (right now every route `print()`s to stdout with emojis).
	- `/healthz` endpoint for container probes.
	- Gunicorn config: `--workers 2 --threads 4 --timeout 600` (screenshot runs are long).

	---

	## 5. One-page TL;DR

	- ✅ Text→Video: backend + frontend fully integrated.
	- ✅ HTML→Video: backend + frontend fully integrated.
	- ⚠ Image/PDF→Video: works server-side but the `complete` event wraps its payload, so the UI shows no screenshots on success — fix backend flatten OR frontend unwrap.
	- ✅ Processes page: reads `/history`, `/list`, `/cache/stats`; unified client + backend view.
	- ❌ Regenerate, Delete, Download-ZIP, Preview, Metrics endpoints are all implemented server-side but unused in UI.
	- ⚠ Screenshot path encoding + history tool-name normalization are low-severity polish.
	- ❌ Backend is local-only today; to host remotely you need auth, CORS pinning, prod WSGI, and log sanitization.