Run on DGX Spark w/ docker

by meanaverage - opened 17 days ago

Discussion

meanaverage

17 days ago

https://github.com/meanaverage/gemma4-dflash-spark-vllm/tree/main

Amihalik

12 days ago

Good stuff! I was able to use this patch to get the verifier and draft running on my 4xA100 40GB setup. I pointed Goose at the endpoint to kick the tires for a multi turn conversation.

I noticed that, after a few thousand tokens, the model would return a few hundred tokens of repetitive gibberish. The next turn, the model would return to “normal” text.

Did you see similar behavior?

If I want reasoning enabled, should I train a new draft model?

meanaverage

11 days ago

If I'm entirely honest I only ran it for benchmarking. I didn't use it in practice, but didn't see gibberish. This sounds more like a verifier model problem than a draft model issues? Would check the verifier separately to make sure it's not a baked issue. I've mostly switched my local sights on Qwen3.6 and another project entirely. It was a fun side quest though.

Re: The reasoning aspect, I'm going to be honest and say I just don't know. :)

fynnsu

Red Hat AI org 8 days ago

@Amihalik I have seen this myself and just opened an issue for it on vllm, since it shouldn't be possible for the drafter to degrade the final output.

The model wasn't trained with response regeneration but not reasoning so you may find the acceptance rates aren't as good with reasoning enabled. To improve reasoning performance, training a new drafter or finetuning this one would help.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment