YT-AI-Automation / README.md
github-actions
Sync Docker Space
5f3e9f5
metadata
title: YT AI Automation
emoji: πŸŽ₯
colorFrom: blue
colorTo: red
sdk: docker
app_port: 7860
pinned: false

TextBro β€” Text β†’ Video Studio

Turn text, raw HTML, images, or PDFs into video-ready screenshots using AI.

  • Backend: Flask + Playwright (Python) β€” originally Screenshot Studio.
  • Frontend: React + Vite + TypeScript + Tailwind CSS.
  • Features: live SSE progress, cancel, screenshot gallery, ZIP download, history, cache inspection. On Windows, the backend can also stitch screenshots into a PowerPoint-driven video.
Devin_project/
β”œβ”€β”€ backend/          # Flask app, routes, Playwright screenshot engine
β”‚   β”œβ”€β”€ app.py
β”‚   β”œβ”€β”€ start.py
β”‚   β”œβ”€β”€ requirements.txt
β”‚   β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ routes/
β”‚   └── src/
└── frontend/         # React SPA
    β”œβ”€β”€ src/
    β”œβ”€β”€ package.json
    └── vite.config.ts

Requirements

  • Python 3.10+ (3.11 recommended)
  • Node.js 20.19+ or 22.13+
  • Playwright's Chromium (installed via playwright install chromium)
  • An API key for an OpenAI-compatible LLM endpoint (Groq, Together, OpenAI, a local llama.cpp server, etc.) β€” the backend uses chat completions.
  • Optional (Windows only) Microsoft PowerPoint, for the screenshot β†’ video pipeline.

First-time setup

# 1) Clone
git clone https://github.com/shiv12345678901/Devin_project.git
cd Devin_project

Backend

cd backend

# (Optional but recommended) create a virtualenv
python -m venv .venv
# Windows:     .venv\Scripts\activate
# macOS/Linux: source .venv/bin/activate

pip install -r requirements.txt
playwright install chromium

# Fill in your API credentials
cp config/config.example.py config/config.py
# Edit config/config.py:
#   API_KEY   = "sk-..."                 # your LLM API key
#   API_URL   = "https://api.groq.com/openai/v1"   # or wherever
#   MODEL     = "llama-3.1-70b-versatile"

Frontend

cd ../frontend
npm install

Running it

You have two options.

Option A β€” dev mode (two terminals, hot reload everywhere)

# Terminal 1
cd backend && python start.py         # http://localhost:5000

# Terminal 2
cd frontend && npm run dev            # http://localhost:5173

Open http://localhost:5173 β€” the Vite dev server proxies every API path to the Flask backend so CORS isn't an issue. Changes to React are hot-reloaded.

Option B β€” single server (Flask serves the built React app)

cd frontend && npm run build          # produces frontend/dist/
cd ../backend && python start.py      # http://localhost:5000

Now Flask serves the UI and the API from one port, so this is also the setup you'd use when pointing a tunnel (ngrok, Cloudflare Tunnel) at it.

What's wired to what

Frontend page Backend endpoint Notes
Text β†’ Video POST /generate-sse SSE progress, cancel via /cancel/<op>
HTML β†’ Video POST /generate-html, /beautify, /minify Synchronous
Image/PDF β†’ Video POST /image-to-screenshots-sse SSE progress, OCR + AI + screenshots
Resources GET /list, /history, /cache/stats, DELETE /delete/<type>/<name>, POST /cache/clear β€”
Gallery GET /screenshots/<path> Served by Flask
ZIP download POST /download-zip Streams a ZIP of selected files

The full API client is in frontend/src/api/client.ts and the SSE state machine in frontend/src/hooks/useGenerate.ts.

Configuration reference

Key values in backend/config/config.py (see backend/config/config.example.py for the full list):

Setting What it controls
API_KEY, API_URL, MODEL Which LLM the backend talks to (chat completions)
PORT, HOST Flask listen address
DEFAULT_VIEWPORT_WIDTH/HEIGHT Screenshot viewport
DEFAULT_ZOOM, DEFAULT_OVERLAP Capture scaling and slide overlap
MAX_SCREENSHOTS_LIMIT Hard cap on screenshots per run
POWERPOINT_* Windows-only PowerPoint/video export
VIDEO_* Resolution / FPS / quality for PPT β†’ video

Scripts

Frontend (inside frontend/)

Command Description
npm run dev Start Vite dev server with API proxy
npm run build TypeScript + production build to dist/
npm run preview Preview the production build locally
npm run lint Run ESLint

Backend (inside backend/)

Command Description
python start.py Launch the Flask app with env checks
python app.py Launch the Flask app directly (skip env checks)

Troubleshooting

  • Configuration file not found when starting the backend β€” you didn't copy config/config.example.py to config/config.py.
  • Generation returns 500 / Failed to get AI response β€” the API key or base URL in config.py is wrong, or the model isn't available from that endpoint.
  • Screenshots are blank β€” run playwright install chromium again.
  • /assets/... 404 on Option B β€” rebuild the frontend after code changes (cd frontend && npm run build).
  • Video export fails on macOS/Linux β€” the PowerPoint exporter is Windows-only. Screenshots still work on all platforms.

Credits

Based on Screenshot Studio by Educated Nepal. Original stack: Flask + Playwright + Llama 3.1 70B.