Recall from embed documents not as good as the original

#4
by o0Linny0o - opened

Just comparing templates from an assignment and the original qwen3.5 9B still out performs this v2.

Notice that its a short thinker , 10-20 seconds where as the original would think up to 55 seconds for the same task ... but at least it was correct.

just some feedback

Thank you for your feedback~

From a methodological perspective, this version follows a purely SFT‑based optimization path (with a shifted data‑distribution focus), so a certain degree of capability trade‑off is actually expected. That said, I’ve already applied several optimization techniques to keep this kind of regression within a relatively acceptable range.

In the design of v2, my goal is to improve efficiency and cost performance on simpler tasks while maintaining a solid baseline accuracy. Because of this, it’s better suited for local deployment in resource‑constrained environments, as well as for lightweight agent scenarios (such as checking or organizing emails, setting reminders, etc. in OpenClaw).

Regarding the situation you mentioned, it’s actually consistent with expectations: for tasks involving complex assignment templates, long‑chain reasoning, or very long text processing, the original model typically offers higher accuracy.

when you said original model, did you mean your v1 or the qwen original?

Just comparing templates from an assignment and the original qwen3.5 9B still out performs this v2.

Notice that its a short thinker , 10-20 seconds where as the original would think up to 55 seconds for the same task ... but at least it was correct.

just some feedback

Please specify which quant was used, and especially if you quantified K/V cache.

when you said original model, did you mean your v1 or the qwen original?

The standard qwen3.5 9b gguf

o0Linny0o changed discussion status to closed

Just comparing templates from an assignment and the original qwen3.5 9B still out performs this v2.

Notice that its a short thinker , 10-20 seconds where as the original would think up to 55 seconds for the same task ... but at least it was correct.

just some feedback

Please specify which quant was used, and especially if you quantified K/V cache.

Was q8_0 model , and K/v both Q8_0
Is that worth investigating

Sign up or log in to comment