Has this model actually undergone any benchmarking?

#3
by CarolineTaylor - opened

I apologize, but I still find it difficult to determine the processing pipeline between the provided reasoning traces and the final training samples. Based on the publicly available description in the model card, I reviewed all of the training data listed by the team. If the model was trained for only one epoch using such extremely short user inputs, it would very likely exhibit significant degradation.

TeichAI org

Currently has not undergone any bench marking
From my understanding LoRA (which is what we do) should not degrade the model because we used short prompts

TeichAI org

Currently has not undergone any bench marking
From my understanding LoRA (which is what we do) should not degrade the model because we used short prompts

All SFT comes with some form of degradation, as it is a destructive training method at its core. Our Qwen3.5 models were distilled using experimental parameters designed to achieve reasoning distillation without destroying the models previous post training, but this distillation was not done using those parameters. Benchmarks will be posted eventually

TeichAI org

Hello @CarolineTaylor ,

Currently we haven't planned on conducting any benchmarks for this model; this is due to computing resource limitations on our side. Since we generally work with GPUs that do not exceed 16 GB of VRAM and do not operate any DIY GPU clusters, we are lacking the capacity to execute benchmarks on this model. We would have to use cloud resources, which come at a cost; our benchmark suite runs for multiple hours. Primarily we spend credits creating the datasets and fine-tuning the distills.
We'd love to add benchmark stats provided by community members.

This comment has been hidden (marked as Off-Topic)
This comment has been hidden (marked as Resolved)
TeichAI org

Please stop sending messages in this thread here after you already sent them in a previous thread and they were responded to. Again this message will be removed due to the inaccuracies and lack of context in the conversation. Please see the other thread you started regarding the dataset in question and you will see my entire long and thorough response

armand0e changed discussion status to closed

Sign up or log in to comment