tool calls?

#4
by CryptoAIM - opened

wont it regress in tool calls if it is trained so much outside of tools? could ya‘ll train on tool calls too? awesome work btw

Tool calls aren't working in my testing (I just posted some notes at https://huggingface.co/Jackrong/Qwopus3.6-27B-v1-preview/discussions/3#69f50f63405b38425b5014cd)

@DanTup he deleted the post lmao

Who deleted it? I'm confused :-)

The notes I'd posted (which I guess now can't be accessed) were related to benchmark scores I'd got testing this model with the same vllm command I used for the base Qwen3.6. I've been publishing some benchmarks for models that fit on a DGX Spark (or I guess 128GB Strix Halo) at https://github.com/DanTup/spark-evals but unfortunately this model didn't work well (I think because something is wrong with tool calls).

If whatever the issue is can be fixed, I will re-run them and update (and, if there are any other fine-tuned versions of models that fit on a Spark, I'd also be interested in trying them too - feel free to open issues in the repo if anyone has suggestions).

probably jackrong since only they have access to deleting posts

Same here, all Jackrong models I have tested (Qwen3.5-9B-DeepSeek-V4-Flash, Qwen3.5-9B-GLM5.1-Distill-v1, Qwopus3.5-9B-v3.5 and this one) improve on Stevibe's BenchLocal benchmark suit, except for bench packs which involve tool calling (CLI-40, Hermes-20 and StructuredOutput-15). No idea why.

Qwen3.5-9B-DeepSeek-V4-Flash and Qwopus3.5-9B-v3.5 are both improvements compared to the base model. Qwen3.6 models have way more post-training done to them so they break even with small amounts of training.

Sign up or log in to comment