ffy2000 commited on
Commit
46a9d20
·
1 Parent(s): fddaf5e

Prepare Lance for Hugging Face Space

Browse files
Files changed (5) hide show
  1. Dockerfile +2 -3
  2. README.md +25 -4
  3. SPACE_DEPLOYMENT.md +2 -1
  4. app.py +10 -7
  5. requirements.txt +16 -3
Dockerfile CHANGED
@@ -25,9 +25,8 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
25
  COPY requirements.txt /app/requirements.txt
26
 
27
  RUN python -m pip install --upgrade pip setuptools wheel \
28
- && grep -v '^flash-attn==' requirements.txt > /tmp/requirements-no-flash-attn.txt \
29
- && python -m pip install -r /tmp/requirements-no-flash-attn.txt \
30
- && python -m pip install flash-attn==2.6.3 --no-build-isolation
31
 
32
  COPY . /app
33
 
 
25
  COPY requirements.txt /app/requirements.txt
26
 
27
  RUN python -m pip install --upgrade pip setuptools wheel \
28
+ && python -m pip install -r requirements.txt \
29
+ && python -m pip install --no-cache-dir --no-deps --force-reinstall "https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
 
30
 
31
  COPY . /app
32
 
README.md CHANGED
@@ -4,7 +4,7 @@ emoji: 🎬
4
  colorFrom: blue
5
  colorTo: indigo
6
  sdk: gradio
7
- python_version: "3.10.13"
8
  sdk_version: "5.31.0"
9
  app_file: app.py
10
  models:
@@ -221,14 +221,35 @@ models:
221
 
222
  ### Recommended Environment
223
 
224
- - **Software:** Python 3.10+, CUDA 12.4+ (required)
225
  - **Hardware:** A GPU with at least 40GB VRAM is required for inference
226
 
227
- ### Installation Steps
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
228
  ```bash
229
- bash ./setup_env.sh
 
 
 
230
  ```
231
 
 
 
232
  ### Download Model Weights
233
 
234
  Please download all necessary model checkpoints from [Lance-3B on Hugging Face](https://huggingface.co/bytedance-research/Lance) and place them in the `downloads/` directory.
 
4
  colorFrom: blue
5
  colorTo: indigo
6
  sdk: gradio
7
+ python_version: "3.11"
8
  sdk_version: "5.31.0"
9
  app_file: app.py
10
  models:
 
221
 
222
  ### Recommended Environment
223
 
224
+ - **Software:** Python 3.11, CUDA 12.4+
225
  - **Hardware:** A GPU with at least 40GB VRAM is required for inference
226
 
227
+ ### Local Environment Setup
228
+
229
+ ```bash
230
+ conda create -n Lance python=3.11 -y
231
+ conda activate Lance
232
+ pip install torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
233
+ pip install -r requirements.txt
234
+ pip install --no-cache-dir --no-deps --force-reinstall \
235
+ "https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
236
+ ```
237
+
238
+ ### Hugging Face Gradio Space Setup
239
+
240
+ For a Gradio Space, keep the repository as `sdk: gradio` and `app_file: app.py`, set the Space hardware to `ZeroGPU`, and let the Space build install `requirements.txt`. The Space build must use Python 3.11 so the pinned flash-attn wheel is compatible.
241
+
242
+ The complete Space-side dependency chain is:
243
+
244
  ```bash
245
+ pip install torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
246
+ pip install -r requirements.txt
247
+ pip install --no-cache-dir --no-deps --force-reinstall \
248
+ "https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
249
  ```
250
 
251
+ At runtime, `app.py` still keeps a startup fallback for flash-attn and model prefetch, but the Space should already have the right packages installed before the UI appears.
252
+
253
  ### Download Model Weights
254
 
255
  Please download all necessary model checkpoints from [Lance-3B on Hugging Face](https://huggingface.co/bytedance-research/Lance) and place them in the `downloads/` directory.
SPACE_DEPLOYMENT.md CHANGED
@@ -6,6 +6,7 @@ This repository is prepared for a Gradio-based Hugging Face Space with ZeroGPU.
6
 
7
  - Space SDK: Gradio
8
  - Space hardware: ZeroGPU
 
9
  - Public port: `7860`
10
  - Entrypoint: `python app.py`
11
  - Recommended use: ZeroGPU for request-scoped GPU allocation
@@ -37,7 +38,7 @@ Useful environment variables:
37
  - `LANCE_QUEUE_SIZE`: Gradio queue size
38
  - `LANCE_GRADIO_TMP_ROOT`: output and temporary file directory
39
  - `LANCE_ZEROGPU_MAX_DURATION_SECONDS`: fixed `@spaces.GPU` duration request in seconds for all tasks (default: 300)
40
- - `LANCE_INSTALL_FLASH_ATTN_ON_STARTUP`: set to `1` to install flash-attn during Space startup instead of inside the GPU reservation
41
  - `LANCE_PREFETCH_MODEL_ASSETS`: set to `0` to skip CPU-side model prefetch at startup
42
  - `LANCE_PREFETCH_MODEL_VARIANTS`: comma-separated model variants to prefetch, for example `video,image`
43
 
 
6
 
7
  - Space SDK: Gradio
8
  - Space hardware: ZeroGPU
9
+ - Space Python: 3.11
10
  - Public port: `7860`
11
  - Entrypoint: `python app.py`
12
  - Recommended use: ZeroGPU for request-scoped GPU allocation
 
38
  - `LANCE_QUEUE_SIZE`: Gradio queue size
39
  - `LANCE_GRADIO_TMP_ROOT`: output and temporary file directory
40
  - `LANCE_ZEROGPU_MAX_DURATION_SECONDS`: fixed `@spaces.GPU` duration request in seconds for all tasks (default: 300)
41
+ - `LANCE_INSTALL_FLASH_ATTN_ON_STARTUP`: set to `1` to install the pinned flash-attn wheel during Space startup instead of inside the GPU reservation (the wheel matches Python 3.11)
42
  - `LANCE_PREFETCH_MODEL_ASSETS`: set to `0` to skip CPU-side model prefetch at startup
43
  - `LANCE_PREFETCH_MODEL_VARIANTS`: comma-separated model variants to prefetch, for example `video,image`
44
 
app.py CHANGED
@@ -75,7 +75,8 @@ RUN_RECORD_FILENAME = "generation_record.json"
75
  LOCAL_MODEL_BASE_DIR = Path("downloads")
76
  SPACE_MODEL_BASE_DIR = Path("/data/lance_models")
77
  DEFAULT_MODEL_REPO_ID = "bytedance-research/Lance"
78
- DEFAULT_FLASH_ATTN_VERSION = "2.6.3"
 
79
  DEFAULT_MODEL_VARIANT = "video"
80
  MODEL_VARIANT_VIDEO = "video"
81
  MODEL_VARIANT_IMAGE = "image"
@@ -3177,16 +3178,17 @@ def get_env_float(name: str, default: float) -> float:
3177
  def ensure_flash_attn_installed() -> None:
3178
  try:
3179
  from importlib.metadata import PackageNotFoundError, version as package_version
3180
- current_version = package_version("flash-attn")
3181
  if current_version == DEFAULT_FLASH_ATTN_VERSION:
 
3182
  return
3183
  print(
3184
- f"[startup] flash-attn {current_version} detected; reinstalling {DEFAULT_FLASH_ATTN_VERSION} without build isolation.",
3185
  flush=True,
3186
  )
3187
  except Exception:
3188
  print(
3189
- f"[startup] flash-attn not available; installing {DEFAULT_FLASH_ATTN_VERSION} without build isolation.",
3190
  flush=True,
3191
  )
3192
 
@@ -3196,11 +3198,12 @@ def ensure_flash_attn_installed() -> None:
3196
  "pip",
3197
  "install",
3198
  "--no-cache-dir",
3199
- "--no-build-isolation",
3200
- f"flash-attn=={DEFAULT_FLASH_ATTN_VERSION}",
 
3201
  ]
3202
  subprocess.check_call(command)
3203
- print(f"[startup] flash-attn {DEFAULT_FLASH_ATTN_VERSION} installed successfully.", flush=True)
3204
 
3205
 
3206
  def get_zerogpu_duration_cap() -> int:
 
75
  LOCAL_MODEL_BASE_DIR = Path("downloads")
76
  SPACE_MODEL_BASE_DIR = Path("/data/lance_models")
77
  DEFAULT_MODEL_REPO_ID = "bytedance-research/Lance"
78
+ DEFAULT_FLASH_ATTN_VERSION = "2.8.3"
79
+ DEFAULT_FLASH_ATTN_WHEEL_URL = "https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
80
  DEFAULT_MODEL_VARIANT = "video"
81
  MODEL_VARIANT_VIDEO = "video"
82
  MODEL_VARIANT_IMAGE = "image"
 
3178
  def ensure_flash_attn_installed() -> None:
3179
  try:
3180
  from importlib.metadata import PackageNotFoundError, version as package_version
3181
+ current_version = package_version("flash_attn")
3182
  if current_version == DEFAULT_FLASH_ATTN_VERSION:
3183
+ print(f"[startup] flash-attn {current_version} already installed.", flush=True)
3184
  return
3185
  print(
3186
+ f"[startup] flash-attn {current_version} detected; reinstalling {DEFAULT_FLASH_ATTN_VERSION} from wheel.",
3187
  flush=True,
3188
  )
3189
  except Exception:
3190
  print(
3191
+ f"[startup] flash-attn not available; installing {DEFAULT_FLASH_ATTN_VERSION} from wheel.",
3192
  flush=True,
3193
  )
3194
 
 
3198
  "pip",
3199
  "install",
3200
  "--no-cache-dir",
3201
+ "--no-deps",
3202
+ "--force-reinstall",
3203
+ DEFAULT_FLASH_ATTN_WHEEL_URL,
3204
  ]
3205
  subprocess.check_call(command)
3206
+ print(f"[startup] flash-attn {DEFAULT_FLASH_ATTN_VERSION} installed from wheel.", flush=True)
3207
 
3208
 
3209
  def get_zerogpu_duration_cap() -> int:
requirements.txt CHANGED
@@ -1,7 +1,11 @@
 
 
 
 
 
1
  absl-py==0.15.0
2
- accelerate==1.13.0
3
  addict==2.4.0
4
- # albumentations==1.4.3
5
  annotated-types==0.7.0
6
  bitsandbytes==0.49.2
7
  certifi==2024.8.30
@@ -16,6 +20,7 @@ filelock==3.16.1
16
  fsspec==2023.6.0
17
  ftfy==6.1.1
18
  h5py==3.12.1
 
19
  imageio==2.34.0
20
  imageio-ffmpeg==0.5.1
21
  Jinja2==3.1.3
@@ -27,7 +32,7 @@ numpy==1.23.5
27
  omegaconf==2.3.0
28
  opencv-python==4.7.0.72
29
  opt_einsum==3.4.0
30
- packaging==26.1
31
  peft==0.5.0
32
  pillow==11.0.0
33
  protobuf==3.20.3
@@ -37,6 +42,7 @@ pydantic==2.11.10
37
  pydantic_core==2.33.2
38
  PyYAML==6.0
39
  qwen-vl-utils==0.0.14
 
40
  requests==2.32.3
41
  safetensors==0.4.5
42
  scikit-image==0.24.0
@@ -54,11 +60,18 @@ torchlibrosa==0.1.0
54
  torchmetrics==1.3.2
55
  tqdm==4.67.3
56
  transformers-stream-generator==0.0.5
 
57
  typing_extensions==4.15.0
58
  urllib3==1.26.20
59
  webdataset==0.2.48
60
  yacs==0.1.8
61
  zipp==3.23.1
 
62
  gpustat
 
 
 
63
  sk-video
64
  spaces
 
 
 
1
+ --extra-index-url https://download.pytorch.org/whl/cu124
2
+ torch==2.5.1+cu124
3
+ torchvision==0.20.1+cu124
4
+ torchaudio==2.5.1+cu124
5
+
6
  absl-py==0.15.0
7
+ accelerate>=0.21.0
8
  addict==2.4.0
 
9
  annotated-types==0.7.0
10
  bitsandbytes==0.49.2
11
  certifi==2024.8.30
 
20
  fsspec==2023.6.0
21
  ftfy==6.1.1
22
  h5py==3.12.1
23
+ huggingface-hub==0.29.1
24
  imageio==2.34.0
25
  imageio-ffmpeg==0.5.1
26
  Jinja2==3.1.3
 
32
  omegaconf==2.3.0
33
  opencv-python==4.7.0.72
34
  opt_einsum==3.4.0
35
+ packaging>=20.8,<26.0
36
  peft==0.5.0
37
  pillow==11.0.0
38
  protobuf==3.20.3
 
42
  pydantic_core==2.33.2
43
  PyYAML==6.0
44
  qwen-vl-utils==0.0.14
45
+ regex==2022.10.31
46
  requests==2.32.3
47
  safetensors==0.4.5
48
  scikit-image==0.24.0
 
60
  torchmetrics==1.3.2
61
  tqdm==4.67.3
62
  transformers-stream-generator==0.0.5
63
+ triton==3.1.0
64
  typing_extensions==4.15.0
65
  urllib3==1.26.20
66
  webdataset==0.2.48
67
  yacs==0.1.8
68
  zipp==3.23.1
69
+ httpx>=0.25.0
70
  gpustat
71
+ transformers==4.49.0
72
+ diffusers==0.29.1
73
+ gradio==5.35
74
  sk-video
75
  spaces
76
+
77
+ flash-attn @ https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl