Need vllm-w4a16-dsv4:exp Thanks
Hi @pasta-paul,
First of all, thank you for this incredible work — the W4A16+FP8 quantization recipe, the detailed mission report, and especially the bootstrap script are genuinely impressive contributions to the community.
I've been trying to build the Docker image following the instructions in the repo, but unfortunately I've failed multiple times due to network connectivity issues on my end. Pulling the base layers, cloning jasl/vllm@ds4-sm120-experimental, and completing the full build in one shot has proven very difficult given my network environment.
The README mentions that an OCI tarball is available upon request — would it be possible to get access to vllm-w4a16-dsv4:exp? Even a mirror link or an alternative download source would be enormously helpful.
My target hardware is similar to your Phase 4e setup (dual DGX Spark, TP=2), so the Blackwell SM 12.x compatible image would be exactly what I need.
Thanks again for open-sourcing this — really appreciate the effort.
Hi @youcai666 — got you covered. Just uploaded the pre-built image tarball
to a new HF dataset:
https://huggingface.co/datasets/pastapaul/dsv4-flash-w4a16-spark-image
To use it on each Spark:
huggingface-cli download pastapaul/dsv4-flash-w4a16-spark-image \
vllm-w4a16-dsv4-exp.tar.gz \
--repo-type dataset \
--local-dir .
gunzip -c vllm-w4a16-dsv4-exp.tar.gz | docker load
That gives you the vllm-w4a16-dsv4:exp image (20.2 GB on disk after load,
10.3 GB compressed download). After loading on both nodes, the bootstrap
script with --skip-build will take you the rest of the way:
curl -fsSLO https://raw.githubusercontent.com/pasta-paul/dsv4-flash-w4a16-fp8/main/scripts/bootstrap_dsv4_spark.sh
chmod +x bootstrap_dsv4_spark.sh
./bootstrap_dsv4_spark.sh \
--head-host spark-a \
--worker-host spark-b \
--skip-build
The dataset's README has the full instructions:
https://huggingface.co/datasets/pastapaul/dsv4-flash-w4a16-spark-image
Let us know if you hit any issues bringing it up — happy to debug.
Hi @youcai666 — got you covered. Just uploaded the pre-built image tarball
to a new HF dataset:https://huggingface.co/datasets/pastapaul/dsv4-flash-w4a16-spark-imageTo use it on each Spark:
huggingface-cli download pastapaul/dsv4-flash-w4a16-spark-image \ vllm-w4a16-dsv4-exp.tar.gz \ --repo-type dataset \ --local-dir . gunzip -c vllm-w4a16-dsv4-exp.tar.gz | docker loadThat gives you the
vllm-w4a16-dsv4:expimage (20.2 GB on disk after load,
10.3 GB compressed download). After loading on both nodes, the bootstrap
script with --skip-build will take you the rest of the way:curl -fsSLO https://raw.githubusercontent.com/pasta-paul/dsv4-flash-w4a16-fp8/main/scripts/bootstrap_dsv4_spark.sh chmod +x bootstrap_dsv4_spark.sh ./bootstrap_dsv4_spark.sh \ --head-host spark-a \ --worker-host spark-b \ --skip-buildThe dataset's README has the full instructions:
https://huggingface.co/datasets/pastapaul/dsv4-flash-w4a16-spark-imageLet us know if you hit any issues bringing it up — happy to debug.
thank you!!!!