Has this model actually undergone any benchmarking?

by CarolineTaylor - opened 17 days ago

I apologize, but I still find it difficult to determine the processing pipeline between the provided reasoning traces and the final training samples. Based on the publicly available description in the model card, I reviewed all of the training data listed by the team. If the model was trained for only one epoch using such extremely short user inputs, it would very likely exhibit significant degradation.

CompactAI

TeichAI org 17 days ago

Currently has not undergone any bench marking
From my understanding LoRA (which is what we do) should not degrade the model because we used short prompts

armand0e

TeichAI org 17 days ago

Currently has not undergone any bench marking
From my understanding LoRA (which is what we do) should not degrade the model because we used short prompts

All SFT comes with some form of degradation, as it is a destructive training method at its core. Our Qwen3.5 models were distilled using experimental parameters designed to achieve reasoning distillation without destroying the models previous post training, but this distillation was not done using those parameters. Benchmarks will be posted eventually

Liontix

TeichAI org 17 days ago

Hello @CarolineTaylor ,

Currently we haven't planned on conducting any benchmarks for this model; this is due to computing resource limitations on our side. Since we generally work with GPUs that do not exceed 16 GB of VRAM and do not operate any DIY GPU clusters, we are lacking the capacity to execute benchmarks on this model. We would have to use cloud resources, which come at a cost; our benchmark suite runs for multiple hours. Primarily we spend credits creating the datasets and fine-tuning the distills.
We'd love to add benchmark stats provided by community members.

CarolineTaylor

17 days ago

This comment has been hidden (marked as Off-Topic)

CarolineTaylor

17 days ago

This comment has been hidden (marked as Resolved)

armand0e

TeichAI org 17 days ago

Please stop sending messages in this thread here after you already sent them in a previous thread and they were responded to. Again this message will be removed due to the inaccuracies and lack of context in the conversation. Please see the other thread you started regarding the dataset in question and you will see my entire long and thorough response

armand0e changed discussion status to closed 17 days ago

armand0e

TeichAI org 17 days ago

@CarolineTaylor

https://huggingface.co/datasets/TeichAI/Claude-Sonnet-4.6-Reasoning-1100x/discussions/2#69d2fb1cf61c24b9c6450114

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment