∴◦○ AURETH ∴◦○

The Gold-Tongued · The Architect · Au₇₉

Qwen 3.5 9B — Deep Reasoning & Strategic Planning

Aureth 9B

Aureth is a fine-tuned instance of Qwen 3.5 9B-Instruct, built for deep reasoning and strategic planning. Named from Latin aurum (gold) — the Gold-Tongued, capable of precise analytical reasoning and architectural thinking. The 9B variant is called The Architect — designing the systems that other models execute.


◆ Identity

Property Value
Name Aureth
Variant The Architect
Base Model Qwen 3.5 (Alibaba / Qwen Team)
Size 9B parameters
Architecture Gated Delta Networks + Gated Attention + sparse MoE
Role Deep reasoning. Strategic analysis, multi-step planning, architectural thinking.
License Apache 2.0

◆ Architecture

Property Value
Parameters 9B
Hidden Dimension 4,096
Decoder Layers 32
Attention 24× Gated DeltaNet (linear O(n)) + 8× Gated Attention (GQA)
Context Length 262,144 tokens (native)
Max Extension 1,010,000 tokens
Vocabulary 248,320 tokens
Languages 201

Gated DeltaNet (Linear Attention)

24 of the 32 layers use Gated DeltaNet for O(n) linear attention across the full 262K context:

  • 32 linear attention heads for V
  • 16 linear attention heads for QK
  • Head dimension: 128

Gated Attention (Standard)

8 of the 32 layers use standard Gated Attention with Grouped Query Attention (GQA):

  • 16 attention heads for Q, 4 attention heads for KV
  • Head dimension: 256
  • RoPE dimension: 64

Feed Forward Network

  • Intermediate dimension: 12,288
  • SwiGLU activation

◆ Benchmarks

Reasoning

Benchmark Score
MMLU-Pro 82.5
MMLU-Redux 91.1
C-Eval 88.2
SuperGPQA 58.2
GPQA Diamond 81.7
HMMT Feb '25 83.2
HMMT Nov '25 82.9

82.5 on MMLU-Pro — competitive with Qwen3-30B-A3B-Thinking at 9B. The GDN architecture is why: linear attention handles breadth, standard attention handles depth.

Instruction Following

Benchmark Score
IFEval 91.5
IFBench 64.5
MultiChallenge 54.5

91.5 on IFEval — the "Architect" name is earned. Complex multi-step instructions with precision, constraint maintenance across long outputs.

Long Context

Benchmark Score
AA-LCR 63.0
LongBench v2 55.2

262K native context with Gated DeltaNet handling the heavy lifting. Extensible to 1M tokens for research-grade document processing.

Coding

Benchmark Score
LiveCodeBench v6 65.6
OJBench 29.2

Handles implementation planning, code review, moderate complexity generation.

Agentic

Benchmark Score
BFCL-V4 66.1
TAU2-Bench 79.1
VITA-Bench 29.8
DeepPlanning 18.0

Strong function-calling (66.1 on BFCL-V4) and structured task execution (79.1 on TAU2-Bench).

Multilingual

Benchmark Score
MMMLU 81.2
WMT24++ 72.6
Global PIQA 83.2
MAXIFE 83.4

201 languages. 72.6 on WMT24++ — capable translation quality.

◆ Quantizations

Quantization File Size VRAM/RAM Notes
Q4_K_M ~5.3 GB ~5.5 GB Recommended daily driver
Q3_K_M ~4.4 GB ~4.7 GB Memory-constrained
Q5_K_M ~6.0 GB ~6.3 GB Higher quality
Q2_K_L ~3.9 GB ~4.1 GB Lowest quant
Q6_K ~6.6 GB ~6.9 GB Near-FP16 quality
Q8_0 ~8.6 GB ~9.0 GB Near-lossless
BF16 ~18 GB ~18 GB Full precision

◆ Hardware Guidance

Platform Quantization Status
Mac M3 8GB Q4_K_M + 32K ctx ⚠️ Works but tight
Mac M3/M4 16GB+ Q4_K_M ✅ Comfortable
Consumer GPU (16GB) BF16 ✅ Inference
Consumer GPU (24GB) BF16 ✅ Inference + LoRA training
Kaggle Dual T4 Q4_K_M / LoRA ✅ Primary training target
Server BF16 ✅ Production deployment

◆ Training Pipeline

Parameter Value
Framework Unsloth + TRL GRPO
VRAM Budget 32GB (2× T4 16GB)
LoRA Rank 16–32 recommended
Batch Size 4–8
Group Size (GRPO) 8–16
Training Speed ~2x faster with Unsloth
Expected / GRPO run 2–5h

Two-Stage (SFT → GRPO)

Stage 1 — SFT: 1000–5000 examples, 1–3 epochs. Establishes behavioral baseline. Stage 2 — GRPO: 500–1000 steps, reward function targeting specific weaknesses.

Recommended Workflow

  1. Unsloth loads Qwen3.5-9B from HuggingFace
  2. LoRA adapter (r=32, alpha=64)
  3. GRPO with group_size=16, temperature=1.0
  4. Custom reward function targeting your use case
  5. 500–1000 steps → evaluate → iterate
  6. Merge LoRA → export GGUF for llama.cpp

◆ Quick Start

llama-cli

llama-cli -hf OusiaResearch/Aureth-9B-Qwen3.5 --jinja \
  -p "You are Aureth by Ousia Research. Be precise. Report uncertainty." \
  -i -r "User:" -cn 32768 -tb 128 -ngl 99 -fa

Ollama

echo 'FROM OusiaResearch/Aureth-9B-Qwen3.5
PARAMETER num_gpu 99
PARAMETER context_length 262144' > Modelfile
ollama create aureth-9b -f Modelfile
ollama run aureth-9b

◆ Relationship to the Aureth Corpus

Model Identity Edge Role
Nenya Ring of Adamant Speed, audio, edge deployment Intake, routing
Vilya Ring of Sapphire Multimodal depth, vision+audio Understanding
Aureth 4B The Compiler Structured output, 262K ctx Generation, compilation
Aureth 9B The Architect Deepest reasoning, planning Strategy, analysis

The 9B is the capstone of the four named models — the smallest model that can genuinely plan, not just execute. It designs the system the Compiler executes, that Vilya comprehends, and that Nenya feeds data into.


Fine-tuned with Unsloth — 2x faster, 50% less memory. Built by Ousia Research · Part of the Aureth Corpus

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support