victor's picture
victor HF Staff
feat: Add GLM-4.7-Flash model, fix Gradio version, add CLAUDE.md
fe1f070
|
raw
history blame
3.49 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

AI Video Composer is a Gradio app that generates FFmpeg commands from natural language. Users upload media files (images, videos, audio), describe what they want, and AI generates the FFmpeg command to create the output video.

Live at: https://huggingface.co/spaces/huggingface-projects/video-composer-gpt4

Development Commands

# Setup with uv (recommended)
uv venv .venv --python 3.12
uv pip install -r requirements.txt
uv pip install -e ./mediagallery

# Run locally (requires HF_TOKEN env var)
.venv/bin/python app.py

# MediaGallery frontend development
cd mediagallery/frontend
npm install
npm run build

# Build MediaGallery as a package
cd mediagallery
python -m build

Architecture

Core Flow

User uploads files → MediaGallery component
         ↓
get_files_infos() extracts metadata (dimensions, duration, audio channels)
         ↓
get_completion() sends prompt + metadata to selected model via OpenAI-compatible API
         ↓
FFmpeg command extracted from response
         ↓
Command validated with dry-run (ffmpeg -f null -)
         ↓
If invalid: retry with error feedback (max 2 attempts)
         ↓
execute_ffmpeg_command() runs with @spaces.GPU acceleration
         ↓
Output video returned

Key Files

File Purpose
app.py Main Gradio app, LLM integration, FFmpeg execution
mediagallery/ Custom Gradio component for mixed media (images + videos + audio)
mediagallery/backend/gradio_mediagallery/mediagallery.py Component backend, data models, preprocess/postprocess
mediagallery/frontend/Index.svelte Component entry point, upload handling
mediagallery/frontend/shared/Gallery.svelte Grid layout, preview modal

MediaGallery Component

Custom Gradio component extending Gallery to support audio files. Structure:

  • Backend: Python component class with GalleryImage, GalleryVideo, GalleryAudio data models
  • Frontend: Svelte 4 + TypeScript, uses @gradio/* packages

The component is installed at runtime on HuggingFace Spaces via:

subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "./mediagallery"])

Model Configuration

Models are configured in app.py under MODELS dict. Users can choose between models via a Radio selector in the UI. Default is GLM-4.7-Flash (first in dict):

MODELS = {
    "zai-org/GLM-4.7-Flash": {
        "base_url": "https://router.huggingface.co/v1",
        "env_key": "HF_TOKEN",
        "model_name": "zai-org/GLM-4.7-Flash:novita",
    },
    "moonshotai/Kimi-K2-Instruct": {
        "base_url": "https://router.huggingface.co/v1",
        "env_key": "HF_TOKEN",
        "model_name": "moonshotai/Kimi-K2-Instruct-0905:groq",
    },
}

Constraints

  • File size limit: 100MB per file
  • Video duration limit: 2 minutes
  • Output format: Always MP4
  • Gradio version: 6.2.0
  • Cached examples: Disabled (doesn't work with Markdown output component)

FFmpeg Command Validation

Commands are validated before execution using a dry-run:

ffmpeg -f null - [command]

If validation fails, the error is fed back to the LLM for a retry (max 2 attempts). Common issues handled:

  • Image dimension mismatches → use scale+pad
  • Portrait/landscape detection for standard resolutions