Spaces:

bytedance-research
/

Lance

Running on Zero

App Files Files Community

ffy2000 commited on 4 days ago

Commit

46a9d20

1 Parent(s): fddaf5e

Prepare Lance for Hugging Face Space

Browse files

Files changed (5) hide show

Dockerfile +2 -3
README.md +25 -4
SPACE_DEPLOYMENT.md +2 -1
app.py +10 -7
requirements.txt +16 -3

Dockerfile CHANGED Viewed

@@ -25,9 +25,8 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
 COPY requirements.txt /app/requirements.txt
 RUN python -m pip install --upgrade pip setuptools wheel \
-    && grep -v '^flash-attn==' requirements.txt > /tmp/requirements-no-flash-attn.txt \
-    && python -m pip install -r /tmp/requirements-no-flash-attn.txt \
-    && python -m pip install flash-attn==2.6.3 --no-build-isolation
 COPY . /app

 COPY requirements.txt /app/requirements.txt
 RUN python -m pip install --upgrade pip setuptools wheel \
+    && python -m pip install -r requirements.txt \
+    && python -m pip install --no-cache-dir --no-deps --force-reinstall "https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
 COPY . /app

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ emoji: 🎬
 colorFrom: blue
 colorTo: indigo
 sdk: gradio
-python_version: "3.10.13"
 sdk_version: "5.31.0"
 app_file: app.py
 models:
@@ -221,14 +221,35 @@ models:
 ### Recommended Environment
-- **Software:** Python 3.10+, CUDA 12.4+ (required)
 - **Hardware:** A GPU with at least 40GB VRAM is required for inference
-### Installation Steps
 ```bash
-bash ./setup_env.sh
 ```
 ### Download Model Weights
 Please download all necessary model checkpoints from [Lance-3B on Hugging Face](https://huggingface.co/bytedance-research/Lance) and place them in the `downloads/` directory.

 colorFrom: blue
 colorTo: indigo
 sdk: gradio
+python_version: "3.11"
 sdk_version: "5.31.0"
 app_file: app.py
 models:
 ### Recommended Environment
+- **Software:** Python 3.11, CUDA 12.4+
 - **Hardware:** A GPU with at least 40GB VRAM is required for inference
+### Local Environment Setup
+```bash
+conda create -n Lance python=3.11 -y
+conda activate Lance
+pip install torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
+pip install -r requirements.txt
+pip install --no-cache-dir --no-deps --force-reinstall \
+  "https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
+```
+### Hugging Face Gradio Space Setup
+For a Gradio Space, keep the repository as `sdk: gradio` and `app_file: app.py`, set the Space hardware to `ZeroGPU`, and let the Space build install `requirements.txt`. The Space build must use Python 3.11 so the pinned flash-attn wheel is compatible.
+The complete Space-side dependency chain is:
 ```bash
+pip install torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
+pip install -r requirements.txt
+pip install --no-cache-dir --no-deps --force-reinstall \
+  "https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
 ```
+At runtime, `app.py` still keeps a startup fallback for flash-attn and model prefetch, but the Space should already have the right packages installed before the UI appears.
 ### Download Model Weights
 Please download all necessary model checkpoints from [Lance-3B on Hugging Face](https://huggingface.co/bytedance-research/Lance) and place them in the `downloads/` directory.

SPACE_DEPLOYMENT.md CHANGED Viewed

@@ -6,6 +6,7 @@ This repository is prepared for a Gradio-based Hugging Face Space with ZeroGPU.
 - Space SDK: Gradio
 - Space hardware: ZeroGPU
 - Public port: `7860`
 - Entrypoint: `python app.py`
 - Recommended use: ZeroGPU for request-scoped GPU allocation
@@ -37,7 +38,7 @@ Useful environment variables:
 - `LANCE_QUEUE_SIZE`: Gradio queue size
 - `LANCE_GRADIO_TMP_ROOT`: output and temporary file directory
 - `LANCE_ZEROGPU_MAX_DURATION_SECONDS`: fixed `@spaces.GPU` duration request in seconds for all tasks (default: 300)
-- `LANCE_INSTALL_FLASH_ATTN_ON_STARTUP`: set to `1` to install flash-attn during Space startup instead of inside the GPU reservation
 - `LANCE_PREFETCH_MODEL_ASSETS`: set to `0` to skip CPU-side model prefetch at startup
 - `LANCE_PREFETCH_MODEL_VARIANTS`: comma-separated model variants to prefetch, for example `video,image`

 - Space SDK: Gradio
 - Space hardware: ZeroGPU
+- Space Python: 3.11
 - Public port: `7860`
 - Entrypoint: `python app.py`
 - Recommended use: ZeroGPU for request-scoped GPU allocation
 - `LANCE_QUEUE_SIZE`: Gradio queue size
 - `LANCE_GRADIO_TMP_ROOT`: output and temporary file directory
 - `LANCE_ZEROGPU_MAX_DURATION_SECONDS`: fixed `@spaces.GPU` duration request in seconds for all tasks (default: 300)
+- `LANCE_INSTALL_FLASH_ATTN_ON_STARTUP`: set to `1` to install the pinned flash-attn wheel during Space startup instead of inside the GPU reservation (the wheel matches Python 3.11)
 - `LANCE_PREFETCH_MODEL_ASSETS`: set to `0` to skip CPU-side model prefetch at startup
 - `LANCE_PREFETCH_MODEL_VARIANTS`: comma-separated model variants to prefetch, for example `video,image`

app.py CHANGED Viewed

@@ -75,7 +75,8 @@ RUN_RECORD_FILENAME = "generation_record.json"
 LOCAL_MODEL_BASE_DIR = Path("downloads")
 SPACE_MODEL_BASE_DIR = Path("/data/lance_models")
 DEFAULT_MODEL_REPO_ID = "bytedance-research/Lance"
-DEFAULT_FLASH_ATTN_VERSION = "2.6.3"
 DEFAULT_MODEL_VARIANT = "video"
 MODEL_VARIANT_VIDEO = "video"
 MODEL_VARIANT_IMAGE = "image"
@@ -3177,16 +3178,17 @@ def get_env_float(name: str, default: float) -> float:
 def ensure_flash_attn_installed() -> None:
     try:
         from importlib.metadata import PackageNotFoundError, version as package_version
-        current_version = package_version("flash-attn")
         if current_version == DEFAULT_FLASH_ATTN_VERSION:
             return
         print(
-            f"[startup] flash-attn {current_version} detected; reinstalling {DEFAULT_FLASH_ATTN_VERSION} without build isolation.",
             flush=True,
         )
     except Exception:
         print(
-            f"[startup] flash-attn not available; installing {DEFAULT_FLASH_ATTN_VERSION} without build isolation.",
             flush=True,
         )
@@ -3196,11 +3198,12 @@ def ensure_flash_attn_installed() -> None:
         "pip",
         "install",
         "--no-cache-dir",
-        "--no-build-isolation",
-        f"flash-attn=={DEFAULT_FLASH_ATTN_VERSION}",
     ]
     subprocess.check_call(command)
-    print(f"[startup] flash-attn {DEFAULT_FLASH_ATTN_VERSION} installed successfully.", flush=True)
 def get_zerogpu_duration_cap() -> int:

 LOCAL_MODEL_BASE_DIR = Path("downloads")
 SPACE_MODEL_BASE_DIR = Path("/data/lance_models")
 DEFAULT_MODEL_REPO_ID = "bytedance-research/Lance"
+DEFAULT_FLASH_ATTN_VERSION = "2.8.3"
+DEFAULT_FLASH_ATTN_WHEEL_URL = "https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
 DEFAULT_MODEL_VARIANT = "video"
 MODEL_VARIANT_VIDEO = "video"
 MODEL_VARIANT_IMAGE = "image"
 def ensure_flash_attn_installed() -> None:
     try:
         from importlib.metadata import PackageNotFoundError, version as package_version
+        current_version = package_version("flash_attn")
         if current_version == DEFAULT_FLASH_ATTN_VERSION:
+            print(f"[startup] flash-attn {current_version} already installed.", flush=True)
             return
         print(
+            f"[startup] flash-attn {current_version} detected; reinstalling {DEFAULT_FLASH_ATTN_VERSION} from wheel.",
             flush=True,
         )
     except Exception:
         print(
+            f"[startup] flash-attn not available; installing {DEFAULT_FLASH_ATTN_VERSION} from wheel.",
             flush=True,
         )
         "pip",
         "install",
         "--no-cache-dir",
+        "--no-deps",
+        "--force-reinstall",
+        DEFAULT_FLASH_ATTN_WHEEL_URL,
     ]
     subprocess.check_call(command)
+    print(f"[startup] flash-attn {DEFAULT_FLASH_ATTN_VERSION} installed from wheel.", flush=True)
 def get_zerogpu_duration_cap() -> int:

requirements.txt CHANGED Viewed

@@ -1,7 +1,11 @@
 absl-py==0.15.0
-accelerate==1.13.0
 addict==2.4.0
-# albumentations==1.4.3
 annotated-types==0.7.0
 bitsandbytes==0.49.2
 certifi==2024.8.30
@@ -16,6 +20,7 @@ filelock==3.16.1
 fsspec==2023.6.0
 ftfy==6.1.1
 h5py==3.12.1
 imageio==2.34.0
 imageio-ffmpeg==0.5.1
 Jinja2==3.1.3
@@ -27,7 +32,7 @@ numpy==1.23.5
 omegaconf==2.3.0
 opencv-python==4.7.0.72
 opt_einsum==3.4.0
-packaging==26.1
 peft==0.5.0
 pillow==11.0.0
 protobuf==3.20.3
@@ -37,6 +42,7 @@ pydantic==2.11.10
 pydantic_core==2.33.2
 PyYAML==6.0
 qwen-vl-utils==0.0.14
 requests==2.32.3
 safetensors==0.4.5
 scikit-image==0.24.0
@@ -54,11 +60,18 @@ torchlibrosa==0.1.0
 torchmetrics==1.3.2
 tqdm==4.67.3
 transformers-stream-generator==0.0.5
 typing_extensions==4.15.0
 urllib3==1.26.20
 webdataset==0.2.48
 yacs==0.1.8
 zipp==3.23.1
 gpustat
 sk-video
 spaces

+--extra-index-url https://download.pytorch.org/whl/cu124
+torch==2.5.1+cu124
+torchvision==0.20.1+cu124
+torchaudio==2.5.1+cu124
 absl-py==0.15.0
+accelerate>=0.21.0
 addict==2.4.0
 annotated-types==0.7.0
 bitsandbytes==0.49.2
 certifi==2024.8.30
 fsspec==2023.6.0
 ftfy==6.1.1
 h5py==3.12.1
+huggingface-hub==0.29.1
 imageio==2.34.0
 imageio-ffmpeg==0.5.1
 Jinja2==3.1.3
 omegaconf==2.3.0
 opencv-python==4.7.0.72
 opt_einsum==3.4.0
+packaging>=20.8,<26.0
 peft==0.5.0
 pillow==11.0.0
 protobuf==3.20.3
 pydantic_core==2.33.2
 PyYAML==6.0
 qwen-vl-utils==0.0.14
+regex==2022.10.31
 requests==2.32.3
 safetensors==0.4.5
 scikit-image==0.24.0
 torchmetrics==1.3.2
 tqdm==4.67.3
 transformers-stream-generator==0.0.5
+triton==3.1.0
 typing_extensions==4.15.0
 urllib3==1.26.20
 webdataset==0.2.48
 yacs==0.1.8
 zipp==3.23.1
+httpx>=0.25.0
 gpustat
+transformers==4.49.0
+diffusers==0.29.1
+gradio==5.35
 sk-video
 spaces
+flash-attn @ https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl