Spaces:
Running on Zero
Running on Zero
feat: Add GLM-4.7-Flash model, fix Gradio version, add CLAUDE.md
Browse files- Add zai-org/GLM-4.7-Flash as default model choice
- Keep moonshotai/Kimi-K2-Instruct as second option
- Fix requirements.txt to use Gradio 6.2.0 (was 5.34.1)
- Add CLAUDE.md with development setup and architecture docs
- Update README.md to list both models
CLAUDE.md
ADDED
|
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CLAUDE.md
|
| 2 |
+
|
| 3 |
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
| 4 |
+
|
| 5 |
+
## Project Overview
|
| 6 |
+
|
| 7 |
+
AI Video Composer is a Gradio app that generates FFmpeg commands from natural language. Users upload media files (images, videos, audio), describe what they want, and AI generates the FFmpeg command to create the output video.
|
| 8 |
+
|
| 9 |
+
**Live at:** https://huggingface.co/spaces/huggingface-projects/video-composer-gpt4
|
| 10 |
+
|
| 11 |
+
## Development Commands
|
| 12 |
+
|
| 13 |
+
```bash
|
| 14 |
+
# Setup with uv (recommended)
|
| 15 |
+
uv venv .venv --python 3.12
|
| 16 |
+
uv pip install -r requirements.txt
|
| 17 |
+
uv pip install -e ./mediagallery
|
| 18 |
+
|
| 19 |
+
# Run locally (requires HF_TOKEN env var)
|
| 20 |
+
.venv/bin/python app.py
|
| 21 |
+
|
| 22 |
+
# MediaGallery frontend development
|
| 23 |
+
cd mediagallery/frontend
|
| 24 |
+
npm install
|
| 25 |
+
npm run build
|
| 26 |
+
|
| 27 |
+
# Build MediaGallery as a package
|
| 28 |
+
cd mediagallery
|
| 29 |
+
python -m build
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
## Architecture
|
| 33 |
+
|
| 34 |
+
### Core Flow
|
| 35 |
+
```
|
| 36 |
+
User uploads files → MediaGallery component
|
| 37 |
+
↓
|
| 38 |
+
get_files_infos() extracts metadata (dimensions, duration, audio channels)
|
| 39 |
+
↓
|
| 40 |
+
get_completion() sends prompt + metadata to selected model via OpenAI-compatible API
|
| 41 |
+
↓
|
| 42 |
+
FFmpeg command extracted from response
|
| 43 |
+
↓
|
| 44 |
+
Command validated with dry-run (ffmpeg -f null -)
|
| 45 |
+
↓
|
| 46 |
+
If invalid: retry with error feedback (max 2 attempts)
|
| 47 |
+
↓
|
| 48 |
+
execute_ffmpeg_command() runs with @spaces.GPU acceleration
|
| 49 |
+
↓
|
| 50 |
+
Output video returned
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
### Key Files
|
| 54 |
+
|
| 55 |
+
| File | Purpose |
|
| 56 |
+
|------|---------|
|
| 57 |
+
| `app.py` | Main Gradio app, LLM integration, FFmpeg execution |
|
| 58 |
+
| `mediagallery/` | Custom Gradio component for mixed media (images + videos + audio) |
|
| 59 |
+
| `mediagallery/backend/gradio_mediagallery/mediagallery.py` | Component backend, data models, preprocess/postprocess |
|
| 60 |
+
| `mediagallery/frontend/Index.svelte` | Component entry point, upload handling |
|
| 61 |
+
| `mediagallery/frontend/shared/Gallery.svelte` | Grid layout, preview modal |
|
| 62 |
+
|
| 63 |
+
### MediaGallery Component
|
| 64 |
+
|
| 65 |
+
Custom Gradio component extending Gallery to support audio files. Structure:
|
| 66 |
+
- **Backend**: Python component class with `GalleryImage`, `GalleryVideo`, `GalleryAudio` data models
|
| 67 |
+
- **Frontend**: Svelte 4 + TypeScript, uses `@gradio/*` packages
|
| 68 |
+
|
| 69 |
+
The component is installed at runtime on HuggingFace Spaces via:
|
| 70 |
+
```python
|
| 71 |
+
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "./mediagallery"])
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
### Model Configuration
|
| 75 |
+
|
| 76 |
+
Models are configured in `app.py` under `MODELS` dict. Users can choose between models via a Radio selector in the UI. Default is GLM-4.7-Flash (first in dict):
|
| 77 |
+
```python
|
| 78 |
+
MODELS = {
|
| 79 |
+
"zai-org/GLM-4.7-Flash": {
|
| 80 |
+
"base_url": "https://router.huggingface.co/v1",
|
| 81 |
+
"env_key": "HF_TOKEN",
|
| 82 |
+
"model_name": "zai-org/GLM-4.7-Flash:novita",
|
| 83 |
+
},
|
| 84 |
+
"moonshotai/Kimi-K2-Instruct": {
|
| 85 |
+
"base_url": "https://router.huggingface.co/v1",
|
| 86 |
+
"env_key": "HF_TOKEN",
|
| 87 |
+
"model_name": "moonshotai/Kimi-K2-Instruct-0905:groq",
|
| 88 |
+
},
|
| 89 |
+
}
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
## Constraints
|
| 93 |
+
|
| 94 |
+
- **File size limit**: 100MB per file
|
| 95 |
+
- **Video duration limit**: 2 minutes
|
| 96 |
+
- **Output format**: Always MP4
|
| 97 |
+
- **Gradio version**: 6.2.0
|
| 98 |
+
- **Cached examples**: Disabled (doesn't work with Markdown output component)
|
| 99 |
+
|
| 100 |
+
## FFmpeg Command Validation
|
| 101 |
+
|
| 102 |
+
Commands are validated before execution using a dry-run:
|
| 103 |
+
```bash
|
| 104 |
+
ffmpeg -f null - [command]
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
If validation fails, the error is fed back to the LLM for a retry (max 2 attempts). Common issues handled:
|
| 108 |
+
- Image dimension mismatches → use `scale+pad`
|
| 109 |
+
- Portrait/landscape detection for standard resolutions
|
README.md
CHANGED
|
@@ -10,6 +10,7 @@ app_file: app.py
|
|
| 10 |
pinned: false
|
| 11 |
disable_embedding: true
|
| 12 |
models:
|
|
|
|
| 13 |
- moonshotai/Kimi-K2-Instruct
|
| 14 |
tags:
|
| 15 |
- ffmpeg
|
|
@@ -58,7 +59,7 @@ Describe what you want in plain English, like "create a slideshow from these ima
|
|
| 58 |
|
| 59 |
4. **Processing**:
|
| 60 |
- The app analyzes your files and instructions
|
| 61 |
-
- Generates an optimized FFmpeg command using
|
| 62 |
- Executes the command and returns the processed video
|
| 63 |
- Displays the generated FFmpeg command for transparency
|
| 64 |
|
|
@@ -76,7 +77,7 @@ Describe what you want in plain English, like "create a slideshow from these ima
|
|
| 76 |
|
| 77 |
- Built with Gradio for the user interface
|
| 78 |
- Uses FFmpeg for media processing
|
| 79 |
-
- Powered by Kimi-K2 for command generation
|
| 80 |
- Implements robust error handling and command validation
|
| 81 |
- Processes files in a temporary directory for safety
|
| 82 |
- Supports both simple operations and complex media transformations
|
|
|
|
| 10 |
pinned: false
|
| 11 |
disable_embedding: true
|
| 12 |
models:
|
| 13 |
+
- zai-org/GLM-4.7-Flash
|
| 14 |
- moonshotai/Kimi-K2-Instruct
|
| 15 |
tags:
|
| 16 |
- ffmpeg
|
|
|
|
| 59 |
|
| 60 |
4. **Processing**:
|
| 61 |
- The app analyzes your files and instructions
|
| 62 |
+
- Generates an optimized FFmpeg command using your chosen AI model
|
| 63 |
- Executes the command and returns the processed video
|
| 64 |
- Displays the generated FFmpeg command for transparency
|
| 65 |
|
|
|
|
| 77 |
|
| 78 |
- Built with Gradio for the user interface
|
| 79 |
- Uses FFmpeg for media processing
|
| 80 |
+
- Powered by GLM-4.7 or Kimi-K2 for command generation
|
| 81 |
- Implements robust error handling and command validation
|
| 82 |
- Processes files in a temporary directory for safety
|
| 83 |
- Supports both simple operations and complex media transformations
|
app.py
CHANGED
|
@@ -22,6 +22,11 @@ import shutil
|
|
| 22 |
|
| 23 |
# Supported models configuration
|
| 24 |
MODELS = {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
"moonshotai/Kimi-K2-Instruct": {
|
| 26 |
"base_url": "https://router.huggingface.co/v1",
|
| 27 |
"env_key": "HF_TOKEN",
|
|
@@ -392,7 +397,7 @@ def compose_video(
|
|
| 392 |
files: list = None,
|
| 393 |
top_p: float = 0.7,
|
| 394 |
temperature: float = 0.1,
|
| 395 |
-
model_choice: str = "
|
| 396 |
) -> str:
|
| 397 |
"""
|
| 398 |
Compose videos from existing media assets using natural language instructions.
|
|
@@ -406,7 +411,7 @@ def compose_video(
|
|
| 406 |
files (list, optional): List of media files (images, videos, audio) to use
|
| 407 |
top_p (float): Top-p sampling parameter for AI model (0.0-1.0, default: 0.7)
|
| 408 |
temperature (float): Temperature parameter for AI model creativity (0.0-5.0, default: 0.1)
|
| 409 |
-
model_choice (str): AI model to use for command generation (default: "
|
| 410 |
|
| 411 |
Returns:
|
| 412 |
str: Path to the generated video file
|
|
@@ -422,7 +427,7 @@ def update(
|
|
| 422 |
prompt,
|
| 423 |
top_p=1,
|
| 424 |
temperature=1,
|
| 425 |
-
model_choice="
|
| 426 |
):
|
| 427 |
if prompt == "":
|
| 428 |
raise gr.Error("Please enter a prompt.")
|
|
@@ -580,7 +585,7 @@ with gr.Blocks() as demo:
|
|
| 580 |
gr.Markdown(
|
| 581 |
"""
|
| 582 |
# 🏞 AI Video Composer: FFMPEG in Plain English
|
| 583 |
-
Upload your media files, describe what you want, and
|
| 584 |
""",
|
| 585 |
elem_id="header",
|
| 586 |
)
|
|
|
|
| 22 |
|
| 23 |
# Supported models configuration
|
| 24 |
MODELS = {
|
| 25 |
+
"zai-org/GLM-4.7-Flash": {
|
| 26 |
+
"base_url": "https://router.huggingface.co/v1",
|
| 27 |
+
"env_key": "HF_TOKEN",
|
| 28 |
+
"model_name": "zai-org/GLM-4.7-Flash:novita",
|
| 29 |
+
},
|
| 30 |
"moonshotai/Kimi-K2-Instruct": {
|
| 31 |
"base_url": "https://router.huggingface.co/v1",
|
| 32 |
"env_key": "HF_TOKEN",
|
|
|
|
| 397 |
files: list = None,
|
| 398 |
top_p: float = 0.7,
|
| 399 |
temperature: float = 0.1,
|
| 400 |
+
model_choice: str = "zai-org/GLM-4.7-Flash",
|
| 401 |
) -> str:
|
| 402 |
"""
|
| 403 |
Compose videos from existing media assets using natural language instructions.
|
|
|
|
| 411 |
files (list, optional): List of media files (images, videos, audio) to use
|
| 412 |
top_p (float): Top-p sampling parameter for AI model (0.0-1.0, default: 0.7)
|
| 413 |
temperature (float): Temperature parameter for AI model creativity (0.0-5.0, default: 0.1)
|
| 414 |
+
model_choice (str): AI model to use for command generation (default: "zai-org/GLM-4.7-Flash")
|
| 415 |
|
| 416 |
Returns:
|
| 417 |
str: Path to the generated video file
|
|
|
|
| 427 |
prompt,
|
| 428 |
top_p=1,
|
| 429 |
temperature=1,
|
| 430 |
+
model_choice="zai-org/GLM-4.7-Flash",
|
| 431 |
):
|
| 432 |
if prompt == "":
|
| 433 |
raise gr.Error("Please enter a prompt.")
|
|
|
|
| 585 |
gr.Markdown(
|
| 586 |
"""
|
| 587 |
# 🏞 AI Video Composer: FFMPEG in Plain English
|
| 588 |
+
Upload your media files, describe what you want, and AI generates the FFMPEG command. Create slideshows from images, add background music, merge video clips, visualize audio waveforms, convert formats, adjust speed, and more.
|
| 589 |
""",
|
| 590 |
elem_id="header",
|
| 591 |
)
|
requirements.txt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
openai>=1.55.0
|
| 2 |
-
gradio==
|
| 3 |
moviepy==1
|
|
|
|
| 1 |
openai>=1.55.0
|
| 2 |
+
gradio==6.2.0
|
| 3 |
moviepy==1
|