Run on DGX Spark w/ docker

#4
by meanaverage - opened

Good stuff! I was able to use this patch to get the verifier and draft running on my 4xA100 40GB setup. I pointed Goose at the endpoint to kick the tires for a multi turn conversation.

I noticed that, after a few thousand tokens, the model would return a few hundred tokens of repetitive gibberish. The next turn, the model would return to “normal” text.

Did you see similar behavior?

If I want reasoning enabled, should I train a new draft model?

If I'm entirely honest I only ran it for benchmarking. I didn't use it in practice, but didn't see gibberish. This sounds more like a verifier model problem than a draft model issues? Would check the verifier separately to make sure it's not a baked issue. I've mostly switched my local sights on Qwen3.6 and another project entirely. It was a fun side quest though.

Re: The reasoning aspect, I'm going to be honest and say I just don't know. :)

Red Hat AI org

@Amihalik I have seen this myself and just opened an issue for it on vllm, since it shouldn't be possible for the drafter to degrade the final output.

The model wasn't trained with response regeneration but not reasoning so you may find the acceptance rates aren't as good with reasoning enabled. To improve reasoning performance, training a new drafter or finetuning this one would help.

Sign up or log in to comment