Spaces:
Running on Zero
Running on Zero
Prepare Lance for Hugging Face Space
Browse files- Dockerfile +2 -3
- README.md +25 -4
- SPACE_DEPLOYMENT.md +2 -1
- app.py +10 -7
- requirements.txt +16 -3
Dockerfile
CHANGED
|
@@ -25,9 +25,8 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
| 25 |
COPY requirements.txt /app/requirements.txt
|
| 26 |
|
| 27 |
RUN python -m pip install --upgrade pip setuptools wheel \
|
| 28 |
-
&&
|
| 29 |
-
&& python -m pip install -
|
| 30 |
-
&& python -m pip install flash-attn==2.6.3 --no-build-isolation
|
| 31 |
|
| 32 |
COPY . /app
|
| 33 |
|
|
|
|
| 25 |
COPY requirements.txt /app/requirements.txt
|
| 26 |
|
| 27 |
RUN python -m pip install --upgrade pip setuptools wheel \
|
| 28 |
+
&& python -m pip install -r requirements.txt \
|
| 29 |
+
&& python -m pip install --no-cache-dir --no-deps --force-reinstall "https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
|
|
|
|
| 30 |
|
| 31 |
COPY . /app
|
| 32 |
|
README.md
CHANGED
|
@@ -4,7 +4,7 @@ emoji: 🎬
|
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: indigo
|
| 6 |
sdk: gradio
|
| 7 |
-
python_version: "3.
|
| 8 |
sdk_version: "5.31.0"
|
| 9 |
app_file: app.py
|
| 10 |
models:
|
|
@@ -221,14 +221,35 @@ models:
|
|
| 221 |
|
| 222 |
### Recommended Environment
|
| 223 |
|
| 224 |
-
- **Software:** Python 3.
|
| 225 |
- **Hardware:** A GPU with at least 40GB VRAM is required for inference
|
| 226 |
|
| 227 |
-
###
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 228 |
```bash
|
| 229 |
-
|
|
|
|
|
|
|
|
|
|
| 230 |
```
|
| 231 |
|
|
|
|
|
|
|
| 232 |
### Download Model Weights
|
| 233 |
|
| 234 |
Please download all necessary model checkpoints from [Lance-3B on Hugging Face](https://huggingface.co/bytedance-research/Lance) and place them in the `downloads/` directory.
|
|
|
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: indigo
|
| 6 |
sdk: gradio
|
| 7 |
+
python_version: "3.11"
|
| 8 |
sdk_version: "5.31.0"
|
| 9 |
app_file: app.py
|
| 10 |
models:
|
|
|
|
| 221 |
|
| 222 |
### Recommended Environment
|
| 223 |
|
| 224 |
+
- **Software:** Python 3.11, CUDA 12.4+
|
| 225 |
- **Hardware:** A GPU with at least 40GB VRAM is required for inference
|
| 226 |
|
| 227 |
+
### Local Environment Setup
|
| 228 |
+
|
| 229 |
+
```bash
|
| 230 |
+
conda create -n Lance python=3.11 -y
|
| 231 |
+
conda activate Lance
|
| 232 |
+
pip install torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
|
| 233 |
+
pip install -r requirements.txt
|
| 234 |
+
pip install --no-cache-dir --no-deps --force-reinstall \
|
| 235 |
+
"https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
|
| 236 |
+
```
|
| 237 |
+
|
| 238 |
+
### Hugging Face Gradio Space Setup
|
| 239 |
+
|
| 240 |
+
For a Gradio Space, keep the repository as `sdk: gradio` and `app_file: app.py`, set the Space hardware to `ZeroGPU`, and let the Space build install `requirements.txt`. The Space build must use Python 3.11 so the pinned flash-attn wheel is compatible.
|
| 241 |
+
|
| 242 |
+
The complete Space-side dependency chain is:
|
| 243 |
+
|
| 244 |
```bash
|
| 245 |
+
pip install torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
|
| 246 |
+
pip install -r requirements.txt
|
| 247 |
+
pip install --no-cache-dir --no-deps --force-reinstall \
|
| 248 |
+
"https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
|
| 249 |
```
|
| 250 |
|
| 251 |
+
At runtime, `app.py` still keeps a startup fallback for flash-attn and model prefetch, but the Space should already have the right packages installed before the UI appears.
|
| 252 |
+
|
| 253 |
### Download Model Weights
|
| 254 |
|
| 255 |
Please download all necessary model checkpoints from [Lance-3B on Hugging Face](https://huggingface.co/bytedance-research/Lance) and place them in the `downloads/` directory.
|
SPACE_DEPLOYMENT.md
CHANGED
|
@@ -6,6 +6,7 @@ This repository is prepared for a Gradio-based Hugging Face Space with ZeroGPU.
|
|
| 6 |
|
| 7 |
- Space SDK: Gradio
|
| 8 |
- Space hardware: ZeroGPU
|
|
|
|
| 9 |
- Public port: `7860`
|
| 10 |
- Entrypoint: `python app.py`
|
| 11 |
- Recommended use: ZeroGPU for request-scoped GPU allocation
|
|
@@ -37,7 +38,7 @@ Useful environment variables:
|
|
| 37 |
- `LANCE_QUEUE_SIZE`: Gradio queue size
|
| 38 |
- `LANCE_GRADIO_TMP_ROOT`: output and temporary file directory
|
| 39 |
- `LANCE_ZEROGPU_MAX_DURATION_SECONDS`: fixed `@spaces.GPU` duration request in seconds for all tasks (default: 300)
|
| 40 |
-
- `LANCE_INSTALL_FLASH_ATTN_ON_STARTUP`: set to `1` to install flash-attn during Space startup instead of inside the GPU reservation
|
| 41 |
- `LANCE_PREFETCH_MODEL_ASSETS`: set to `0` to skip CPU-side model prefetch at startup
|
| 42 |
- `LANCE_PREFETCH_MODEL_VARIANTS`: comma-separated model variants to prefetch, for example `video,image`
|
| 43 |
|
|
|
|
| 6 |
|
| 7 |
- Space SDK: Gradio
|
| 8 |
- Space hardware: ZeroGPU
|
| 9 |
+
- Space Python: 3.11
|
| 10 |
- Public port: `7860`
|
| 11 |
- Entrypoint: `python app.py`
|
| 12 |
- Recommended use: ZeroGPU for request-scoped GPU allocation
|
|
|
|
| 38 |
- `LANCE_QUEUE_SIZE`: Gradio queue size
|
| 39 |
- `LANCE_GRADIO_TMP_ROOT`: output and temporary file directory
|
| 40 |
- `LANCE_ZEROGPU_MAX_DURATION_SECONDS`: fixed `@spaces.GPU` duration request in seconds for all tasks (default: 300)
|
| 41 |
+
- `LANCE_INSTALL_FLASH_ATTN_ON_STARTUP`: set to `1` to install the pinned flash-attn wheel during Space startup instead of inside the GPU reservation (the wheel matches Python 3.11)
|
| 42 |
- `LANCE_PREFETCH_MODEL_ASSETS`: set to `0` to skip CPU-side model prefetch at startup
|
| 43 |
- `LANCE_PREFETCH_MODEL_VARIANTS`: comma-separated model variants to prefetch, for example `video,image`
|
| 44 |
|
app.py
CHANGED
|
@@ -75,7 +75,8 @@ RUN_RECORD_FILENAME = "generation_record.json"
|
|
| 75 |
LOCAL_MODEL_BASE_DIR = Path("downloads")
|
| 76 |
SPACE_MODEL_BASE_DIR = Path("/data/lance_models")
|
| 77 |
DEFAULT_MODEL_REPO_ID = "bytedance-research/Lance"
|
| 78 |
-
DEFAULT_FLASH_ATTN_VERSION = "2.
|
|
|
|
| 79 |
DEFAULT_MODEL_VARIANT = "video"
|
| 80 |
MODEL_VARIANT_VIDEO = "video"
|
| 81 |
MODEL_VARIANT_IMAGE = "image"
|
|
@@ -3177,16 +3178,17 @@ def get_env_float(name: str, default: float) -> float:
|
|
| 3177 |
def ensure_flash_attn_installed() -> None:
|
| 3178 |
try:
|
| 3179 |
from importlib.metadata import PackageNotFoundError, version as package_version
|
| 3180 |
-
current_version = package_version("
|
| 3181 |
if current_version == DEFAULT_FLASH_ATTN_VERSION:
|
|
|
|
| 3182 |
return
|
| 3183 |
print(
|
| 3184 |
-
f"[startup] flash-attn {current_version} detected; reinstalling {DEFAULT_FLASH_ATTN_VERSION}
|
| 3185 |
flush=True,
|
| 3186 |
)
|
| 3187 |
except Exception:
|
| 3188 |
print(
|
| 3189 |
-
f"[startup] flash-attn not available; installing {DEFAULT_FLASH_ATTN_VERSION}
|
| 3190 |
flush=True,
|
| 3191 |
)
|
| 3192 |
|
|
@@ -3196,11 +3198,12 @@ def ensure_flash_attn_installed() -> None:
|
|
| 3196 |
"pip",
|
| 3197 |
"install",
|
| 3198 |
"--no-cache-dir",
|
| 3199 |
-
"--no-
|
| 3200 |
-
|
|
|
|
| 3201 |
]
|
| 3202 |
subprocess.check_call(command)
|
| 3203 |
-
print(f"[startup] flash-attn {DEFAULT_FLASH_ATTN_VERSION} installed
|
| 3204 |
|
| 3205 |
|
| 3206 |
def get_zerogpu_duration_cap() -> int:
|
|
|
|
| 75 |
LOCAL_MODEL_BASE_DIR = Path("downloads")
|
| 76 |
SPACE_MODEL_BASE_DIR = Path("/data/lance_models")
|
| 77 |
DEFAULT_MODEL_REPO_ID = "bytedance-research/Lance"
|
| 78 |
+
DEFAULT_FLASH_ATTN_VERSION = "2.8.3"
|
| 79 |
+
DEFAULT_FLASH_ATTN_WHEEL_URL = "https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl"
|
| 80 |
DEFAULT_MODEL_VARIANT = "video"
|
| 81 |
MODEL_VARIANT_VIDEO = "video"
|
| 82 |
MODEL_VARIANT_IMAGE = "image"
|
|
|
|
| 3178 |
def ensure_flash_attn_installed() -> None:
|
| 3179 |
try:
|
| 3180 |
from importlib.metadata import PackageNotFoundError, version as package_version
|
| 3181 |
+
current_version = package_version("flash_attn")
|
| 3182 |
if current_version == DEFAULT_FLASH_ATTN_VERSION:
|
| 3183 |
+
print(f"[startup] flash-attn {current_version} already installed.", flush=True)
|
| 3184 |
return
|
| 3185 |
print(
|
| 3186 |
+
f"[startup] flash-attn {current_version} detected; reinstalling {DEFAULT_FLASH_ATTN_VERSION} from wheel.",
|
| 3187 |
flush=True,
|
| 3188 |
)
|
| 3189 |
except Exception:
|
| 3190 |
print(
|
| 3191 |
+
f"[startup] flash-attn not available; installing {DEFAULT_FLASH_ATTN_VERSION} from wheel.",
|
| 3192 |
flush=True,
|
| 3193 |
)
|
| 3194 |
|
|
|
|
| 3198 |
"pip",
|
| 3199 |
"install",
|
| 3200 |
"--no-cache-dir",
|
| 3201 |
+
"--no-deps",
|
| 3202 |
+
"--force-reinstall",
|
| 3203 |
+
DEFAULT_FLASH_ATTN_WHEEL_URL,
|
| 3204 |
]
|
| 3205 |
subprocess.check_call(command)
|
| 3206 |
+
print(f"[startup] flash-attn {DEFAULT_FLASH_ATTN_VERSION} installed from wheel.", flush=True)
|
| 3207 |
|
| 3208 |
|
| 3209 |
def get_zerogpu_duration_cap() -> int:
|
requirements.txt
CHANGED
|
@@ -1,7 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
absl-py==0.15.0
|
| 2 |
-
accelerate=
|
| 3 |
addict==2.4.0
|
| 4 |
-
# albumentations==1.4.3
|
| 5 |
annotated-types==0.7.0
|
| 6 |
bitsandbytes==0.49.2
|
| 7 |
certifi==2024.8.30
|
|
@@ -16,6 +20,7 @@ filelock==3.16.1
|
|
| 16 |
fsspec==2023.6.0
|
| 17 |
ftfy==6.1.1
|
| 18 |
h5py==3.12.1
|
|
|
|
| 19 |
imageio==2.34.0
|
| 20 |
imageio-ffmpeg==0.5.1
|
| 21 |
Jinja2==3.1.3
|
|
@@ -27,7 +32,7 @@ numpy==1.23.5
|
|
| 27 |
omegaconf==2.3.0
|
| 28 |
opencv-python==4.7.0.72
|
| 29 |
opt_einsum==3.4.0
|
| 30 |
-
packaging=
|
| 31 |
peft==0.5.0
|
| 32 |
pillow==11.0.0
|
| 33 |
protobuf==3.20.3
|
|
@@ -37,6 +42,7 @@ pydantic==2.11.10
|
|
| 37 |
pydantic_core==2.33.2
|
| 38 |
PyYAML==6.0
|
| 39 |
qwen-vl-utils==0.0.14
|
|
|
|
| 40 |
requests==2.32.3
|
| 41 |
safetensors==0.4.5
|
| 42 |
scikit-image==0.24.0
|
|
@@ -54,11 +60,18 @@ torchlibrosa==0.1.0
|
|
| 54 |
torchmetrics==1.3.2
|
| 55 |
tqdm==4.67.3
|
| 56 |
transformers-stream-generator==0.0.5
|
|
|
|
| 57 |
typing_extensions==4.15.0
|
| 58 |
urllib3==1.26.20
|
| 59 |
webdataset==0.2.48
|
| 60 |
yacs==0.1.8
|
| 61 |
zipp==3.23.1
|
|
|
|
| 62 |
gpustat
|
|
|
|
|
|
|
|
|
|
| 63 |
sk-video
|
| 64 |
spaces
|
|
|
|
|
|
|
|
|
| 1 |
+
--extra-index-url https://download.pytorch.org/whl/cu124
|
| 2 |
+
torch==2.5.1+cu124
|
| 3 |
+
torchvision==0.20.1+cu124
|
| 4 |
+
torchaudio==2.5.1+cu124
|
| 5 |
+
|
| 6 |
absl-py==0.15.0
|
| 7 |
+
accelerate>=0.21.0
|
| 8 |
addict==2.4.0
|
|
|
|
| 9 |
annotated-types==0.7.0
|
| 10 |
bitsandbytes==0.49.2
|
| 11 |
certifi==2024.8.30
|
|
|
|
| 20 |
fsspec==2023.6.0
|
| 21 |
ftfy==6.1.1
|
| 22 |
h5py==3.12.1
|
| 23 |
+
huggingface-hub==0.29.1
|
| 24 |
imageio==2.34.0
|
| 25 |
imageio-ffmpeg==0.5.1
|
| 26 |
Jinja2==3.1.3
|
|
|
|
| 32 |
omegaconf==2.3.0
|
| 33 |
opencv-python==4.7.0.72
|
| 34 |
opt_einsum==3.4.0
|
| 35 |
+
packaging>=20.8,<26.0
|
| 36 |
peft==0.5.0
|
| 37 |
pillow==11.0.0
|
| 38 |
protobuf==3.20.3
|
|
|
|
| 42 |
pydantic_core==2.33.2
|
| 43 |
PyYAML==6.0
|
| 44 |
qwen-vl-utils==0.0.14
|
| 45 |
+
regex==2022.10.31
|
| 46 |
requests==2.32.3
|
| 47 |
safetensors==0.4.5
|
| 48 |
scikit-image==0.24.0
|
|
|
|
| 60 |
torchmetrics==1.3.2
|
| 61 |
tqdm==4.67.3
|
| 62 |
transformers-stream-generator==0.0.5
|
| 63 |
+
triton==3.1.0
|
| 64 |
typing_extensions==4.15.0
|
| 65 |
urllib3==1.26.20
|
| 66 |
webdataset==0.2.48
|
| 67 |
yacs==0.1.8
|
| 68 |
zipp==3.23.1
|
| 69 |
+
httpx>=0.25.0
|
| 70 |
gpustat
|
| 71 |
+
transformers==4.49.0
|
| 72 |
+
diffusers==0.29.1
|
| 73 |
+
gradio==5.35
|
| 74 |
sk-video
|
| 75 |
spaces
|
| 76 |
+
|
| 77 |
+
flash-attn @ https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.5cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
|