GGUF
English
llama.cpp
unsloth
qwen3_5
conversational

Qwen3.5 9B - Opus Agent

This is a finetune on Opus traces and a small dataset. Reasoning was left untouched

Total train time: 4 hours

Benchmarks

General benchmarks

Benchmark Comparison

Benchmarks provided by @nightmedia, as always thanks for taking the time :)

                                  arc	    arc/e	boolq
armand0e/Qwen3.5-9B-Opus-Agent	0.589	0.747	0.901
   Jackrong/Qwopus3.5-9B-Coder	0.561	0.721	0.89
                    Qwen3.5-9B	0.571	0.719	0.895

Targeted benchmarks

Conducted via BenchLocal. All benchmarks are 2-shot (1 retry on failure) for ease of comparison to the numbers found in Jackrong's Qwopus3.5 Coder

All benchmarks for other models were done in Q8_0 only this model's benchmarks were done in Q4_K_M

1. Instruction Following - InstructFollow-15

InstructFollow-15 evaluates formatting, count, numbering, sentence, and length constraints.

Instruction Following - InstructFollow-15 Metrics
Model Test Set Comprehensive Score Dimension Scores (A/B/C/D/E)
armand0e/Qwen3.5-9B-Opus-Agent InstructFollow-15 97 100 / 100 / 100 / 85 / 100
Jackrong/Qwopus3.5-9B-coder InstructFollow-15 93 100 / 100 / 100 / 67 / 100

2. Code Debugging & Bug Fixing - BugFind-15

BugFind-15 evaluates real debugging capability across syntax bugs, logic errors, and trap code.

Code Debugging & Bug Fixing - BugFind-15 Metrics
Model Test Set Comprehensive Score Dimension Scores (A/B/C/D/E)
armand0e/Qwen3.5-9B-Opus-Agent BugFind-15 84 67 / 100 / 87 / 67 / 90
Jackrong/Qwopus3.5-9B-coder BugFind-15 79 67 / 87 / 100 / 77 / 43
Jackrong/MLX-Qwen3.5-9B-DeepSeek-V4-Flash BugFind-15 75 67 / 100 / 67 / 57 / 80
armand0e/Qwen3.5-9B-Agent BugFind-15 58 29 / 87 / 73 / 20 / 67

3. Tool Call Stability - ToolCall-15

ToolCall-15 targets stability and precision in direct tool-calling behavior.

Tool Call Stability - ToolCall-15 Metrics
Model Test Set Comprehensive Score Dimension Scores (A/B/C/D/E)
armand0e/Qwen3.5-9B-Opus-Agent ToolCall-15 100 100 / 100 / 100 / 100 / 100
Jackrong/Qwopus3.5-9B-coder ToolCall-15 100 100 / 100 / 100 / 100 / 100
Qwen/Qwen3.5-9B ToolCall-15 100 100 / 100 / 100 / 100 / 100
armand0e/Qwen3.5-9B-Agent ToolCall-15 93 100 / 100 / 100 / 67 / 100

4. Complex Agent Performance - HermesAgent-20

HermesAgent-20 evaluates complex agent behavior across memory, orchestration, skill use, scheduling, and delegation.

Complex Agent Performance - HermesAgent-20 Metrics
Model Test Set Comprehensive Score Core Dimensions (Memory / Orchestration / Skills / Scheduling / Boundaries)
Jackrong/Qwopus3.5-9B-coder HermesAgent-20 85 84 / 93 / 88 / 75 / 84
armand0e/Qwen3.5-9B-Opus-Agent HermesAgent-20 80 100 / 93 / 80 / 75 / 50
Qwen/Qwen3.5-9B HermesAgent-20 71 75 / 58 / 100 / 53 / 69
armand0e/Qwen3.5-9B-Agent HermesAgent-20 68 71 / 83 / 43 / 61 / 80
DJLougen/Harmonic-Hermes-9B HermesAgent-20 47 60 / 45 / 23 / 69 / 38
Click to show screenshots

Ignore the gemma-4 llama.cpp alias I had set, this was old and I forgot to change it

ToolCall-15

image

HermesAgent-20

image

BugFind-15

image

InstructFollow-15

image

Training Script

Training Script
# -*- coding: utf-8 -*-
import os
from unsloth import FastModel
import torch
from trl import SFTConfig, SFTTrainer
from teich import mask_data, prepare_data

MAX_SEQ_LEN = 32768
MODEL_NAME = os.environ.get("MODEL_NAME", "qwen/Qwen3.5-9B")
OUTPUT_DIR = os.environ.get("OUTPUT_DIR", "outputs/qwen-tool-sft")
HUB_REPO_ID = os.environ.get("HUB_REPO_ID", "armand0e/Qwen3.5-9B-Opus-Agent")
HF_TOKEN = os.environ.get("HF_TOKEN", "")

model, tokenizer = FastModel.from_pretrained(
    model_name=MODEL_NAME,
    max_seq_length=MAX_SEQ_LEN,
    load_in_4bit=False,
    load_in_8bit=False,
    full_finetuning=False,
)

model = FastModel.get_peft_model(
    model,
    finetune_vision_layers     = False, # Turn off for just text!
    finetune_language_layers   = True,  # Should leave on!
    finetune_attention_modules = True,  # Attention good for GRPO
    finetune_mlp_modules       = True,  # Should leave on always!

    r = 32,           # Larger = higher accuracy, but might overfit
    lora_alpha = 64,  # Recommended alpha == r at least
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
)

train_dataset = prepare_data(
    {
        "chat": {
            "source": "TeichAI/claude-4.5-opus-high-reasoning-250x"
        },
        "opus-agent": {
            "source": "armand0e/badlogicgames-pi-mono-opus-filtered",
        },
    },
    tokenizer,
    split="train",
    hf_token=HF_TOKEN,
    chat_template_kwargs={"enable_thinking": True},
    max_length=MAX_SEQ_LEN,
    drop_oversized_examples=True,
    trim_oversized_followups=True,
    tokenize=True,
    strict=True,
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=None,
    args=SFTConfig(
        dataset_text_field="text",
        dataset_num_proc=1,
        max_length=MAX_SEQ_LEN,
        packing=False,
        per_device_train_batch_size=1,
        gradient_accumulation_steps=8,
        warmup_steps= 5,
        num_train_epochs=2,
        learning_rate=2e-5,
        logging_steps=1,
        save_steps=100,
        save_total_limit=3,
        optim="adamw_8bit",
        weight_decay=0.01,
        max_grad_norm=0.3,
        lr_scheduler_type="linear",
        output_dir=OUTPUT_DIR,
        seed=3407,
        report_to="none",
    ),
)

trainer = mask_data(
    trainer,
    tokenizer=tokenizer,
    train_on_reasoning=False,
    train_on_final_answers=True,
    train_on_tools=True,
)

print(trainer.train_dataset.preview())

trainer_stats = trainer.train(resume_from_checkpoint=False)

model.push_to_hub(f"{HUB_REPO_ID}-LoRA", token=HF_TOKEN)
tokenizer.push_to_hub(f"{HUB_REPO_ID}-LoRA", token=HF_TOKEN)

model.push_to_hub_merged(HUB_REPO_ID, tokenizer, save_method="merged_16bit", token=HF_TOKEN)

The data for this model was easily formatted and masked with Teich

  • Developed by: armand0e
  • License: apache-2.0
  • Finetuned from model : Qwen/Qwen3.5-9B

This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
1,509
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for armand0e/Qwen3.5-9B-Opus-Agent-GGUF

Finetuned
Qwen/Qwen3.5-9B
Quantized
(1)
this model

Datasets used to train armand0e/Qwen3.5-9B-Opus-Agent-GGUF