victor HF Staff commited on
Commit
fe1f070
·
1 Parent(s): 4068126

feat: Add GLM-4.7-Flash model, fix Gradio version, add CLAUDE.md

Browse files

- Add zai-org/GLM-4.7-Flash as default model choice
- Keep moonshotai/Kimi-K2-Instruct as second option
- Fix requirements.txt to use Gradio 6.2.0 (was 5.34.1)
- Add CLAUDE.md with development setup and architecture docs
- Update README.md to list both models

Files changed (4) hide show
  1. CLAUDE.md +109 -0
  2. README.md +3 -2
  3. app.py +9 -4
  4. requirements.txt +1 -1
CLAUDE.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ AI Video Composer is a Gradio app that generates FFmpeg commands from natural language. Users upload media files (images, videos, audio), describe what they want, and AI generates the FFmpeg command to create the output video.
8
+
9
+ **Live at:** https://huggingface.co/spaces/huggingface-projects/video-composer-gpt4
10
+
11
+ ## Development Commands
12
+
13
+ ```bash
14
+ # Setup with uv (recommended)
15
+ uv venv .venv --python 3.12
16
+ uv pip install -r requirements.txt
17
+ uv pip install -e ./mediagallery
18
+
19
+ # Run locally (requires HF_TOKEN env var)
20
+ .venv/bin/python app.py
21
+
22
+ # MediaGallery frontend development
23
+ cd mediagallery/frontend
24
+ npm install
25
+ npm run build
26
+
27
+ # Build MediaGallery as a package
28
+ cd mediagallery
29
+ python -m build
30
+ ```
31
+
32
+ ## Architecture
33
+
34
+ ### Core Flow
35
+ ```
36
+ User uploads files → MediaGallery component
37
+
38
+ get_files_infos() extracts metadata (dimensions, duration, audio channels)
39
+
40
+ get_completion() sends prompt + metadata to selected model via OpenAI-compatible API
41
+
42
+ FFmpeg command extracted from response
43
+
44
+ Command validated with dry-run (ffmpeg -f null -)
45
+
46
+ If invalid: retry with error feedback (max 2 attempts)
47
+
48
+ execute_ffmpeg_command() runs with @spaces.GPU acceleration
49
+
50
+ Output video returned
51
+ ```
52
+
53
+ ### Key Files
54
+
55
+ | File | Purpose |
56
+ |------|---------|
57
+ | `app.py` | Main Gradio app, LLM integration, FFmpeg execution |
58
+ | `mediagallery/` | Custom Gradio component for mixed media (images + videos + audio) |
59
+ | `mediagallery/backend/gradio_mediagallery/mediagallery.py` | Component backend, data models, preprocess/postprocess |
60
+ | `mediagallery/frontend/Index.svelte` | Component entry point, upload handling |
61
+ | `mediagallery/frontend/shared/Gallery.svelte` | Grid layout, preview modal |
62
+
63
+ ### MediaGallery Component
64
+
65
+ Custom Gradio component extending Gallery to support audio files. Structure:
66
+ - **Backend**: Python component class with `GalleryImage`, `GalleryVideo`, `GalleryAudio` data models
67
+ - **Frontend**: Svelte 4 + TypeScript, uses `@gradio/*` packages
68
+
69
+ The component is installed at runtime on HuggingFace Spaces via:
70
+ ```python
71
+ subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "./mediagallery"])
72
+ ```
73
+
74
+ ### Model Configuration
75
+
76
+ Models are configured in `app.py` under `MODELS` dict. Users can choose between models via a Radio selector in the UI. Default is GLM-4.7-Flash (first in dict):
77
+ ```python
78
+ MODELS = {
79
+ "zai-org/GLM-4.7-Flash": {
80
+ "base_url": "https://router.huggingface.co/v1",
81
+ "env_key": "HF_TOKEN",
82
+ "model_name": "zai-org/GLM-4.7-Flash:novita",
83
+ },
84
+ "moonshotai/Kimi-K2-Instruct": {
85
+ "base_url": "https://router.huggingface.co/v1",
86
+ "env_key": "HF_TOKEN",
87
+ "model_name": "moonshotai/Kimi-K2-Instruct-0905:groq",
88
+ },
89
+ }
90
+ ```
91
+
92
+ ## Constraints
93
+
94
+ - **File size limit**: 100MB per file
95
+ - **Video duration limit**: 2 minutes
96
+ - **Output format**: Always MP4
97
+ - **Gradio version**: 6.2.0
98
+ - **Cached examples**: Disabled (doesn't work with Markdown output component)
99
+
100
+ ## FFmpeg Command Validation
101
+
102
+ Commands are validated before execution using a dry-run:
103
+ ```bash
104
+ ffmpeg -f null - [command]
105
+ ```
106
+
107
+ If validation fails, the error is fed back to the LLM for a retry (max 2 attempts). Common issues handled:
108
+ - Image dimension mismatches → use `scale+pad`
109
+ - Portrait/landscape detection for standard resolutions
README.md CHANGED
@@ -10,6 +10,7 @@ app_file: app.py
10
  pinned: false
11
  disable_embedding: true
12
  models:
 
13
  - moonshotai/Kimi-K2-Instruct
14
  tags:
15
  - ffmpeg
@@ -58,7 +59,7 @@ Describe what you want in plain English, like "create a slideshow from these ima
58
 
59
  4. **Processing**:
60
  - The app analyzes your files and instructions
61
- - Generates an optimized FFmpeg command using Kimi-K2
62
  - Executes the command and returns the processed video
63
  - Displays the generated FFmpeg command for transparency
64
 
@@ -76,7 +77,7 @@ Describe what you want in plain English, like "create a slideshow from these ima
76
 
77
  - Built with Gradio for the user interface
78
  - Uses FFmpeg for media processing
79
- - Powered by Kimi-K2 for command generation
80
  - Implements robust error handling and command validation
81
  - Processes files in a temporary directory for safety
82
  - Supports both simple operations and complex media transformations
 
10
  pinned: false
11
  disable_embedding: true
12
  models:
13
+ - zai-org/GLM-4.7-Flash
14
  - moonshotai/Kimi-K2-Instruct
15
  tags:
16
  - ffmpeg
 
59
 
60
  4. **Processing**:
61
  - The app analyzes your files and instructions
62
+ - Generates an optimized FFmpeg command using your chosen AI model
63
  - Executes the command and returns the processed video
64
  - Displays the generated FFmpeg command for transparency
65
 
 
77
 
78
  - Built with Gradio for the user interface
79
  - Uses FFmpeg for media processing
80
+ - Powered by GLM-4.7 or Kimi-K2 for command generation
81
  - Implements robust error handling and command validation
82
  - Processes files in a temporary directory for safety
83
  - Supports both simple operations and complex media transformations
app.py CHANGED
@@ -22,6 +22,11 @@ import shutil
22
 
23
  # Supported models configuration
24
  MODELS = {
 
 
 
 
 
25
  "moonshotai/Kimi-K2-Instruct": {
26
  "base_url": "https://router.huggingface.co/v1",
27
  "env_key": "HF_TOKEN",
@@ -392,7 +397,7 @@ def compose_video(
392
  files: list = None,
393
  top_p: float = 0.7,
394
  temperature: float = 0.1,
395
- model_choice: str = "moonshotai/Kimi-K2-Instruct",
396
  ) -> str:
397
  """
398
  Compose videos from existing media assets using natural language instructions.
@@ -406,7 +411,7 @@ def compose_video(
406
  files (list, optional): List of media files (images, videos, audio) to use
407
  top_p (float): Top-p sampling parameter for AI model (0.0-1.0, default: 0.7)
408
  temperature (float): Temperature parameter for AI model creativity (0.0-5.0, default: 0.1)
409
- model_choice (str): AI model to use for command generation (default: "deepseek-ai/DeepSeek-V3")
410
 
411
  Returns:
412
  str: Path to the generated video file
@@ -422,7 +427,7 @@ def update(
422
  prompt,
423
  top_p=1,
424
  temperature=1,
425
- model_choice="moonshotai/Kimi-K2-Instruct",
426
  ):
427
  if prompt == "":
428
  raise gr.Error("Please enter a prompt.")
@@ -580,7 +585,7 @@ with gr.Blocks() as demo:
580
  gr.Markdown(
581
  """
582
  # 🏞 AI Video Composer: FFMPEG in Plain English
583
- Upload your media files, describe what you want, and [Kimi-K2](https://huggingface.co/moonshotai/Kimi-K2-Instruct) generates the FFMPEG command. Create slideshows from images, add background music, merge video clips, visualize audio waveforms, convert formats, adjust speed, and more.
584
  """,
585
  elem_id="header",
586
  )
 
22
 
23
  # Supported models configuration
24
  MODELS = {
25
+ "zai-org/GLM-4.7-Flash": {
26
+ "base_url": "https://router.huggingface.co/v1",
27
+ "env_key": "HF_TOKEN",
28
+ "model_name": "zai-org/GLM-4.7-Flash:novita",
29
+ },
30
  "moonshotai/Kimi-K2-Instruct": {
31
  "base_url": "https://router.huggingface.co/v1",
32
  "env_key": "HF_TOKEN",
 
397
  files: list = None,
398
  top_p: float = 0.7,
399
  temperature: float = 0.1,
400
+ model_choice: str = "zai-org/GLM-4.7-Flash",
401
  ) -> str:
402
  """
403
  Compose videos from existing media assets using natural language instructions.
 
411
  files (list, optional): List of media files (images, videos, audio) to use
412
  top_p (float): Top-p sampling parameter for AI model (0.0-1.0, default: 0.7)
413
  temperature (float): Temperature parameter for AI model creativity (0.0-5.0, default: 0.1)
414
+ model_choice (str): AI model to use for command generation (default: "zai-org/GLM-4.7-Flash")
415
 
416
  Returns:
417
  str: Path to the generated video file
 
427
  prompt,
428
  top_p=1,
429
  temperature=1,
430
+ model_choice="zai-org/GLM-4.7-Flash",
431
  ):
432
  if prompt == "":
433
  raise gr.Error("Please enter a prompt.")
 
585
  gr.Markdown(
586
  """
587
  # 🏞 AI Video Composer: FFMPEG in Plain English
588
+ Upload your media files, describe what you want, and AI generates the FFMPEG command. Create slideshows from images, add background music, merge video clips, visualize audio waveforms, convert formats, adjust speed, and more.
589
  """,
590
  elem_id="header",
591
  )
requirements.txt CHANGED
@@ -1,3 +1,3 @@
1
  openai>=1.55.0
2
- gradio==5.34.1
3
  moviepy==1
 
1
  openai>=1.55.0
2
+ gradio==6.2.0
3
  moviepy==1