Recall from embed documents not as good as the original

by o0Linny0o - opened Mar 19

Mar 19

Just comparing templates from an assignment and the original qwen3.5 9B still out performs this v2.

Notice that its a short thinker , 10-20 seconds where as the original would think up to 55 seconds for the same task ... but at least it was correct.

just some feedback

Jackrong

Owner Mar 19

Thank you for your feedback～

From a methodological perspective, this version follows a purely SFT‑based optimization path (with a shifted data‑distribution focus), so a certain degree of capability trade‑off is actually expected. That said, I’ve already applied several optimization techniques to keep this kind of regression within a relatively acceptable range.

In the design of v2, my goal is to improve efficiency and cost performance on simpler tasks while maintaining a solid baseline accuracy. Because of this, it’s better suited for local deployment in resource‑constrained environments, as well as for lightweight agent scenarios (such as checking or organizing emails, setting reminders, etc. in OpenClaw).

Regarding the situation you mentioned, it’s actually consistent with expectations: for tasks involving complex assignment templates, long‑chain reasoning, or very long text processing, the original model typically offers higher accuracy.

raharis

Mar 19

when you said original model, did you mean your v1 or the qwen original?

JDWarner

Mar 20

Just comparing templates from an assignment and the original qwen3.5 9B still out performs this v2.

Notice that its a short thinker , 10-20 seconds where as the original would think up to 55 seconds for the same task ... but at least it was correct.

just some feedback

Please specify which quant was used, and especially if you quantified K/V cache.

o0Linny0o

Mar 20

when you said original model, did you mean your v1 or the qwen original?

The standard qwen3.5 9b gguf

o0Linny0o changed discussion status to closed Mar 20

o0Linny0o

Mar 20

Just comparing templates from an assignment and the original qwen3.5 9B still out performs this v2.

Notice that its a short thinker , 10-20 seconds where as the original would think up to 55 seconds for the same task ... but at least it was correct.

just some feedback

Please specify which quant was used, and especially if you quantified K/V cache.

Was q8_0 model , and K/v both Q8_0
Is that worth investigating

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment