Sadly This is Completely Useless Or Am I Missing Something?

by Jayyaj - opened 14 days ago

I downloaded Q4 K M with great hope for it to be better than Qwopus v3 27b and tried it on LM Studio with 100k context and it terribly failed to build a simple game in html,css,js

It also failed badly using it on Opencode to test tool calling.

Sampling param:

Temp: 1
Top K: 64
Top P: 0.95

Why this happened? Any idea? Am I doing anything wrong?

zxcepsycho

14 days ago

is this a gemma4->gemopus4 regression or a general issue with gemma4 ?

Jayyaj

14 days ago

•

edited 14 days ago

gemma4->gemopus4 (this model from Jackrong)

I tested normal gemma 4 Q4 K M they do much better job on tool calling and building the game. However, it doesn't builds in lazy way too basic.

Qwopus 3.5 v3 27b Q4 K M on the other hand work very good but it is slower as it is dense model

Jackrong

Owner 14 days ago

Hi, thanks a lot for your feedback!

During fine-tuning, I intentionally preserved the original reasoning structure of the base model and did not apply aggressive modifications to the chain-of-thought, to avoid degrading its inherent capabilities without a subsequent RL alignment stage.

Tool calling with Gemma 4 is currently not very stable across the ecosystem.Using very long context lengths (e.g., 100k) can be quite aggressive for this model and may significantly impact performance.

If possible, could you share the failing prompts or the generated outputs logs. That would really help me debug and improve the model further. Appreciate it!

zxcepsycho

14 days ago

Yeah, just tested it. q4km completely broken

zxcepsycho

14 days ago

If possible, could you share the failing prompts or the generated outputs logs. That would really help me debug and improve the model further. Appreciate it!

check ~/ESR/ESR1.md and find an sql vulnerability in orders controller starting line 691 using the esr1
2:08 PM
<|channel>thought
<channel|>```bash
grep -n "SELECT" src/controllers/orders.controller.ts
```bash
ls src/controllers/orders.controller.ts
read_file src/controllers/orders.controller.ts --line-range 691-750
grep -n "order" src/controllers/orders.controller.ts
grep -n "query" src/controllers/orders.controller.ts
grep -n "WHERE" src/controllers/orders.controller.ts
cat src/controllers/orders.controller.ts | sed -n '691,750p'
cat src/controllers/orders.controller.ts | sed -n '691,750p'
grep -nE "WHERE|ORDER|GROUP|HAVING" src/controllers/orders.controller.ts
cat src/controllers/orders.controller.ts | sed -n '680,800p'
sed -n '691,750p

Jackrong

Owner 14 days ago

Hi, thanks — this is actually very helpful.

From the logs you shared, it looks like the model is generating tool-use plans (shell commands), but the execution loop isn’t completing — so it keeps producing actions without ever returning a final answer.

This usually indicates that: the tool execution pipeline isn’t properly wired or the environment (e.g. LM Studio / llama.cpp) isn’t feeding results back to the model

Gemma 4’s tool-calling format is also quite different from typical OpenAI-style APIs, so mismatches can easily lead to this kind of behavior.

Jayyaj

14 days ago

•

edited 14 days ago

Does this mean we are out of luck with Gemma 4 27b a4b model for it to be improved like Qwen 3.5 model?

Also please can you have Qwopus 3.5 35b a3b v3... pretty please?

Jackrong

Owner 14 days ago

Yes — at least in my view, SFT has its limitations here.

Gemma 4 27B A4B only activates around 4B parameters per forward pass, so it is naturally difficult to compare it directly with a fully dense 27B model. On top of that, the base capability of Gemma 4 is, in my opinion, still behind the Qwen 3.5 series in several areas.

Another issue is that fine-tuning Gemma models does not feel as stable as fine-tuning Qwen models. In practice, Gemma tends to show sharper loss fluctuations and less stable gradients in some training runs. This may be related to Google’s architecture design, although I cannot say that with complete certainty yet.

As for Qwopus 3.5 35B A3B, I do plan to make it in the future. The main limitation right now is that I still do not have enough high-quality teacher data for that project.

For now, I’m also waiting to see whether DeepSeek V4 brings any pleasant surprises.

Jayyaj

14 days ago

Yes — at least in my view, SFT has its limitations here.

Gemma 4 27B A4B only activates around 4B parameters per forward pass, so it is naturally difficult to compare it directly with a fully dense 27B model. On top of that, the base capability of Gemma 4 is, in my opinion, still behind the Qwen 3.5 series in several areas.

Another issue is that fine-tuning Gemma models does not feel as stable as fine-tuning Qwen models. In practice, Gemma tends to show sharper loss fluctuations and less stable gradients in some training runs. This may be related to Google’s architecture design, although I cannot say that with complete certainty yet.

As for Qwopus 3.5 35B A3B, I do plan to make it in the future. The main limitation right now is that I still do not have enough high-quality teacher data for that project.

For now, I’m also waiting to see whether DeepSeek V4 brings any pleasant surprises.

Thanks so much for taking the time to address and explain things in a detailed way. This "Gemma 4 27B A4B only activates around 4B parameters per forward pass" makes a lot of sense now as to why it is producing gibberish. Hopefully in the near future this limitation can be tackled too.

relorelolore

12 days ago

It's a cuda bug.

zekromVale

7 days ago

Yah, Q4 is broken for me in roo code:

# narrative_engine/utils.py (lines 523-524)

<<<<<<< SEARCH
            if i != active_idx:
                content = re.sub(r"### 🛤️ Narrative Paths.*?(?=<think>|$)", "", content, flags=re.DOTALL).strip()
                content = re.sub(r"<options>.*?</options>", "", content, flags=re.DOTALL).strip()
=======
            if i != active_idx:
                # Remove director output (both all-caps and mixed case headers) from previous turns
                for pattern in [
                    r"### 🛤️ NARRATIVE PATHS.*?(?=<think>|$)",  # All caps version found in duplicate issue
                    r"### 🛤️ Narrative Paths.*?(?=<think>|---|\n\s*$)" # Standard mixed case with optional separator handling
                ]:
                    content = re.sub(pattern, "", content, flags=re.DOTALL).strip()

                # Remove options tags from previous turns
                if "<options>" in content:
                    content = re.sub(r"<options>.*?</options>", "", content, flags=re.DOTALL).strip()
>>>>>>> REPLACE

It sometimes works on the second try with a temperature set. But it ignores my responses and just does not think or say anything it just moves to fix the code incorrectly.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment