Need vllm-w4a16-dsv4:exp Thanks

#2
by youcai666 - opened

Hi @pasta-paul,

First of all, thank you for this incredible work — the W4A16+FP8 quantization recipe, the detailed mission report, and especially the bootstrap script are genuinely impressive contributions to the community.

I've been trying to build the Docker image following the instructions in the repo, but unfortunately I've failed multiple times due to network connectivity issues on my end. Pulling the base layers, cloning jasl/vllm@ds4-sm120-experimental, and completing the full build in one shot has proven very difficult given my network environment.

The README mentions that an OCI tarball is available upon request — would it be possible to get access to vllm-w4a16-dsv4:exp? Even a mirror link or an alternative download source would be enormously helpful.

My target hardware is similar to your Phase 4e setup (dual DGX Spark, TP=2), so the Blackwell SM 12.x compatible image would be exactly what I need.

Thanks again for open-sourcing this — really appreciate the effort.

Canada Quant Labs org

Hi @youcai666 — got you covered. Just uploaded the pre-built image tarball
to a new HF dataset:

https://huggingface.co/datasets/pastapaul/dsv4-flash-w4a16-spark-image

To use it on each Spark:

huggingface-cli download pastapaul/dsv4-flash-w4a16-spark-image \
  vllm-w4a16-dsv4-exp.tar.gz \
  --repo-type dataset \
  --local-dir .
gunzip -c vllm-w4a16-dsv4-exp.tar.gz | docker load

That gives you the vllm-w4a16-dsv4:exp image (20.2 GB on disk after load,
10.3 GB compressed download). After loading on both nodes, the bootstrap
script with --skip-build will take you the rest of the way:

curl -fsSLO https://raw.githubusercontent.com/pasta-paul/dsv4-flash-w4a16-fp8/main/scripts/bootstrap_dsv4_spark.sh
chmod +x bootstrap_dsv4_spark.sh
./bootstrap_dsv4_spark.sh \
  --head-host spark-a \
  --worker-host spark-b \
  --skip-build

The dataset's README has the full instructions:

https://huggingface.co/datasets/pastapaul/dsv4-flash-w4a16-spark-image

Let us know if you hit any issues bringing it up — happy to debug.

Hi @youcai666 — got you covered. Just uploaded the pre-built image tarball
to a new HF dataset:

https://huggingface.co/datasets/pastapaul/dsv4-flash-w4a16-spark-image

To use it on each Spark:

huggingface-cli download pastapaul/dsv4-flash-w4a16-spark-image \
  vllm-w4a16-dsv4-exp.tar.gz \
  --repo-type dataset \
  --local-dir .
gunzip -c vllm-w4a16-dsv4-exp.tar.gz | docker load

That gives you the vllm-w4a16-dsv4:exp image (20.2 GB on disk after load,
10.3 GB compressed download). After loading on both nodes, the bootstrap
script with --skip-build will take you the rest of the way:

curl -fsSLO https://raw.githubusercontent.com/pasta-paul/dsv4-flash-w4a16-fp8/main/scripts/bootstrap_dsv4_spark.sh
chmod +x bootstrap_dsv4_spark.sh
./bootstrap_dsv4_spark.sh \
  --head-host spark-a \
  --worker-host spark-b \
  --skip-build

The dataset's README has the full instructions:

https://huggingface.co/datasets/pastapaul/dsv4-flash-w4a16-spark-image

Let us know if you hit any issues bringing it up — happy to debug.

thank you!!!!

Sign up or log in to comment