Spaces:

shiva0013
/

YT-AI-Automation

Running

App Files Files Community

YT-AI-Automation / README.md

github-actions

Sync Docker Space

5f3e9f5 3 days ago

preview code

raw

history blame contribute delete

6.36 kB

metadata

title: YT AI Automation
emoji: 🎥
colorFrom: blue
colorTo: red
sdk: docker
app_port: 7860
pinned: false

TextBro — Text → Video Studio

Turn text, raw HTML, images, or PDFs into video-ready screenshots using AI.

Backend: Flask + Playwright (Python) — originally Screenshot Studio.
Frontend: React + Vite + TypeScript + Tailwind CSS.
Features: live SSE progress, cancel, screenshot gallery, ZIP download, history, cache inspection. On Windows, the backend can also stitch screenshots into a PowerPoint-driven video.

Devin_project/
├── backend/          # Flask app, routes, Playwright screenshot engine
│   ├── app.py
│   ├── start.py
│   ├── requirements.txt
│   ├── config/
│   ├── routes/
│   └── src/
└── frontend/         # React SPA
    ├── src/
    ├── package.json
    └── vite.config.ts

Requirements

Python 3.10+ (3.11 recommended)
Node.js 20.19+ or 22.13+
Playwright's Chromium (installed via playwright install chromium)
An API key for an OpenAI-compatible LLM endpoint (Groq, Together, OpenAI, a local llama.cpp server, etc.) — the backend uses chat completions.
Optional (Windows only) Microsoft PowerPoint, for the screenshot → video pipeline.

First-time setup

# 1) Clone
git clone https://github.com/shiv12345678901/Devin_project.git
cd Devin_project

Backend

cd backend

# (Optional but recommended) create a virtualenv
python -m venv .venv
# Windows:     .venv\Scripts\activate
# macOS/Linux: source .venv/bin/activate

pip install -r requirements.txt
playwright install chromium

# Fill in your API credentials
cp config/config.example.py config/config.py
# Edit config/config.py:
#   API_KEY   = "sk-..."                 # your LLM API key
#   API_URL   = "https://api.groq.com/openai/v1"   # or wherever
#   MODEL     = "llama-3.1-70b-versatile"

Frontend

cd ../frontend
npm install

Running it

You have two options.

Option A — dev mode (two terminals, hot reload everywhere)

# Terminal 1
cd backend && python start.py         # http://localhost:5000

# Terminal 2
cd frontend && npm run dev            # http://localhost:5173

Open http://localhost:5173 — the Vite dev server proxies every API path to the Flask backend so CORS isn't an issue. Changes to React are hot-reloaded.

Option B — single server (Flask serves the built React app)

cd frontend && npm run build          # produces frontend/dist/
cd ../backend && python start.py      # http://localhost:5000

Now Flask serves the UI and the API from one port, so this is also the setup you'd use when pointing a tunnel (ngrok, Cloudflare Tunnel) at it.

What's wired to what

Frontend page	Backend endpoint	Notes
Text → Video	`POST /generate-sse`	SSE progress, cancel via `/cancel/<op>`
HTML → Video	`POST /generate-html`, `/beautify`, `/minify`	Synchronous
Image/PDF → Video	`POST /image-to-screenshots-sse`	SSE progress, OCR + AI + screenshots
Resources	`GET /list`, `/history`, `/cache/stats`, `DELETE /delete/<type>/<name>`, `POST /cache/clear`	—
Gallery	`GET /screenshots/<path>`	Served by Flask
ZIP download	`POST /download-zip`	Streams a ZIP of selected files

The full API client is in frontend/src/api/client.ts and the SSE state machine in frontend/src/hooks/useGenerate.ts.

Configuration reference

Key values in backend/config/config.py (see backend/config/config.example.py for the full list):

Setting	What it controls
`API_KEY`, `API_URL`, `MODEL`	Which LLM the backend talks to (chat completions)
`PORT`, `HOST`	Flask listen address
`DEFAULT_VIEWPORT_WIDTH/HEIGHT`	Screenshot viewport
`DEFAULT_ZOOM`, `DEFAULT_OVERLAP`	Capture scaling and slide overlap
`MAX_SCREENSHOTS_LIMIT`	Hard cap on screenshots per run
`POWERPOINT_*`	Windows-only PowerPoint/video export
`VIDEO_*`	Resolution / FPS / quality for PPT → video

Scripts

Frontend (inside frontend/)

Command	Description
`npm run dev`	Start Vite dev server with API proxy
`npm run build`	TypeScript + production build to `dist/`
`npm run preview`	Preview the production build locally
`npm run lint`	Run ESLint

Backend (inside backend/)

Command	Description
`python start.py`	Launch the Flask app with env checks
`python app.py`	Launch the Flask app directly (skip env checks)

Troubleshooting

Configuration file not found when starting the backend — you didn't copy config/config.example.py to config/config.py.
Generation returns 500 / Failed to get AI response — the API key or base URL in config.py is wrong, or the model isn't available from that endpoint.
Screenshots are blank — run playwright install chromium again.
/assets/... 404 on Option B — rebuild the frontend after code changes (cd frontend && npm run build).
Video export fails on macOS/Linux — the PowerPoint exporter is Windows-only. Screenshots still work on all platforms.

Credits

Based on Screenshot Studio by Educated Nepal. Original stack: Flask + Playwright + Llama 3.1 70B.