test v12 + qwen 27B + llamacpp + pi coding agent

#12
by tooltd - opened

I don't know why, but I jumped from your v8 to v12 :)) V8 seems to work better.
It frequently crashes when using tool. Below is log I copied from terminal:

Test Live Logs qua Playwright:
mcp

Ran Playwright code

await page.goto('http://localhost:18090/web/dashboard.html');                      
…                                                                                  
(Ctrl+O to expand)                                                                         
Let me wait for data to load, then navigate to Live Logs.                                                                                    
mcp                                                                                
Invalid args JSON: Expected ',' or '}' after property value in JSON at position    
419 (line 1 column 420)                                                            
JSON issue with newlines in the function string. Let me use a different approach — 
write the JS to a file and source it.                                                                                                                   
mcp                                                                                
Invalid args JSON: Expected ',' or '}' after property value in JSON at position    
294 (line 1 column 295)                                                            
The issue is with multi-line function strings in JSON. Let me use single-line      
format.                                                                                                                                            
mcp                                                                                
### Result                                                                         
"done"                                                                             
### Ran Playwright code                                                            
…                                                                                  
(Ctrl+O to expand)                                                                                                                     
Good, the single-line format works. Let me wait and then check the deployment      
dropdown.                                                                                                                                           
mcp                                                                                
Invalid args JSON: Expected ',' or '}' after property value in JSON at position    
139 (line 1 column 140)                                                            
The issue is with the = sign or the string format. Let me try escaping             
differently.                                                                                                                                             
mcp                                                                                
### Result                                                                         
[                                                                                  
  "|Select deployment...",                                                         
…                                                                                  
(Ctrl+O to expand)                                                                 
                                                                                   

I've had this occur on pi more frequently as well, it gets caught in reasoning loops or the model gets tripped up by some tool call, like the agent passing bash tool JSON input but it's missing a crucial field, or it failing to use the edit/patch tool because it didn't provide oldString or newString correctly.

Pi seems to be weaker at self-healing. But in your case since malformed JSON seems to be the issue, I'd suspect your model quant or runtime more than the harness at the moment. Make no mistake though - this is guesswork. Using the 35ba3b model instead of the 27b could yield worse results. Using a fine-tuned model carries risks as well - I tend towards claude distills instead of uncensored agents, though both have usable results and the claude distills tend to think a bit more.

Please try with v15. I think I have finally managed to fix the overthinking/indecision, and the repeated loops on tool errors among other things. So far, in my testing it is holding up. I am hoping this will be the final version.

froggeric changed discussion status to closed

Sign up or log in to comment