YT-AI-Automation / README.md
github-actions
Sync Docker Space
5f3e9f5
---
title: YT AI Automation
emoji: πŸŽ₯
colorFrom: blue
colorTo: red
sdk: docker
app_port: 7860
pinned: false
---
# TextBro β€” Text β†’ Video Studio
Turn text, raw HTML, images, or PDFs into video-ready screenshots using AI.
- **Backend**: Flask + Playwright (Python) β€” originally
[Screenshot Studio](https://github.com/shiv12345678901/yt-project).
- **Frontend**: React + Vite + TypeScript + Tailwind CSS.
- **Features**: live SSE progress, cancel, screenshot gallery, ZIP download,
history, cache inspection. On Windows, the backend can also stitch
screenshots into a PowerPoint-driven video.
```
Devin_project/
β”œβ”€β”€ backend/ # Flask app, routes, Playwright screenshot engine
β”‚ β”œβ”€β”€ app.py
β”‚ β”œβ”€β”€ start.py
β”‚ β”œβ”€β”€ requirements.txt
β”‚ β”œβ”€β”€ config/
β”‚ β”œβ”€β”€ routes/
β”‚ └── src/
└── frontend/ # React SPA
β”œβ”€β”€ src/
β”œβ”€β”€ package.json
└── vite.config.ts
```
## Requirements
- **Python** 3.10+ (3.11 recommended)
- **Node.js** 20.19+ or 22.13+
- **Playwright's Chromium** (installed via `playwright install chromium`)
- An API key for an OpenAI-compatible LLM endpoint (Groq, Together, OpenAI,
a local `llama.cpp` server, etc.) β€” the backend uses chat completions.
- **Optional (Windows only)** Microsoft PowerPoint, for the
screenshot β†’ video pipeline.
## First-time setup
```bash
# 1) Clone
git clone https://github.com/shiv12345678901/Devin_project.git
cd Devin_project
```
### Backend
```bash
cd backend
# (Optional but recommended) create a virtualenv
python -m venv .venv
# Windows: .venv\Scripts\activate
# macOS/Linux: source .venv/bin/activate
pip install -r requirements.txt
playwright install chromium
# Fill in your API credentials
cp config/config.example.py config/config.py
# Edit config/config.py:
# API_KEY = "sk-..." # your LLM API key
# API_URL = "https://api.groq.com/openai/v1" # or wherever
# MODEL = "llama-3.1-70b-versatile"
```
### Frontend
```bash
cd ../frontend
npm install
```
## Running it
You have two options.
### Option A β€” dev mode (two terminals, hot reload everywhere)
```bash
# Terminal 1
cd backend && python start.py # http://localhost:5000
# Terminal 2
cd frontend && npm run dev # http://localhost:5173
```
Open http://localhost:5173 β€” the Vite dev server proxies every API path to
the Flask backend so CORS isn't an issue. Changes to React are hot-reloaded.
### Option B β€” single server (Flask serves the built React app)
```bash
cd frontend && npm run build # produces frontend/dist/
cd ../backend && python start.py # http://localhost:5000
```
Now Flask serves the UI and the API from one port, so this is also the
setup you'd use when pointing a tunnel (ngrok, Cloudflare Tunnel) at it.
## What's wired to what
| Frontend page | Backend endpoint | Notes |
| ----------------- | ----------------------------------- | ----------------------------------- |
| Text β†’ Video | `POST /generate-sse` | SSE progress, cancel via `/cancel/<op>` |
| HTML β†’ Video | `POST /generate-html`, `/beautify`, `/minify` | Synchronous |
| Image/PDF β†’ Video | `POST /image-to-screenshots-sse` | SSE progress, OCR + AI + screenshots |
| Resources | `GET /list`, `/history`, `/cache/stats`, `DELETE /delete/<type>/<name>`, `POST /cache/clear` | β€” |
| Gallery | `GET /screenshots/<path>` | Served by Flask |
| ZIP download | `POST /download-zip` | Streams a ZIP of selected files |
The full API client is in
[`frontend/src/api/client.ts`](frontend/src/api/client.ts) and the SSE
state machine in
[`frontend/src/hooks/useGenerate.ts`](frontend/src/hooks/useGenerate.ts).
## Configuration reference
Key values in `backend/config/config.py` (see
`backend/config/config.example.py` for the full list):
| Setting | What it controls |
| ------------------------------ | -------------------------------------------- |
| `API_KEY`, `API_URL`, `MODEL` | Which LLM the backend talks to (chat completions) |
| `PORT`, `HOST` | Flask listen address |
| `DEFAULT_VIEWPORT_WIDTH/HEIGHT`| Screenshot viewport |
| `DEFAULT_ZOOM`, `DEFAULT_OVERLAP` | Capture scaling and slide overlap |
| `MAX_SCREENSHOTS_LIMIT` | Hard cap on screenshots per run |
| `POWERPOINT_*` | Windows-only PowerPoint/video export |
| `VIDEO_*` | Resolution / FPS / quality for PPT β†’ video |
## Scripts
**Frontend** (inside `frontend/`)
| Command | Description |
| ----------------- | ---------------------------------------- |
| `npm run dev` | Start Vite dev server with API proxy |
| `npm run build` | TypeScript + production build to `dist/` |
| `npm run preview` | Preview the production build locally |
| `npm run lint` | Run ESLint |
**Backend** (inside `backend/`)
| Command | Description |
| --------------------- | ---------------------------------------------------- |
| `python start.py` | Launch the Flask app with env checks |
| `python app.py` | Launch the Flask app directly (skip env checks) |
## Troubleshooting
- **`Configuration file not found`** when starting the backend β€” you didn't
copy `config/config.example.py` to `config/config.py`.
- **Generation returns 500 / `Failed to get AI response`** β€” the API key or
base URL in `config.py` is wrong, or the model isn't available from that
endpoint.
- **Screenshots are blank** β€” run `playwright install chromium` again.
- **`/assets/...` 404 on Option B** β€” rebuild the frontend after code
changes (`cd frontend && npm run build`).
- **Video export fails on macOS/Linux** β€” the PowerPoint exporter is
Windows-only. Screenshots still work on all platforms.
## Credits
Based on [Screenshot Studio](https://github.com/shiv12345678901/yt-project)
by Educated Nepal. Original stack: Flask + Playwright + Llama 3.1 70B.