∴◦○ AURETH ∴◦○

The Gold-Tongued · The Architect · Au₇₉

Qwen 3.5 9B — Deep Reasoning & Strategic Planning

Aureth is a fine-tuned instance of Qwen 3.5 9B-Instruct, built for deep reasoning and strategic planning. Named from Latin aurum (gold) — the Gold-Tongued, capable of precise analytical reasoning and architectural thinking. The 9B variant is called The Architect — designing the systems that other models execute.

◆ Identity

Property	Value
Name	Aureth
Variant	The Architect
Base Model	Qwen 3.5 (Alibaba / Qwen Team)
Size	9B parameters
Architecture	Gated Delta Networks + Gated Attention + sparse MoE
Role	Deep reasoning. Strategic analysis, multi-step planning, architectural thinking.
License	Apache 2.0

◆ Architecture

Property	Value
Parameters	9B
Hidden Dimension	4,096
Decoder Layers	32
Attention	24× Gated DeltaNet (linear O(n)) + 8× Gated Attention (GQA)
Context Length	262,144 tokens (native)
Max Extension	1,010,000 tokens
Vocabulary	248,320 tokens
Languages	201

Gated DeltaNet (Linear Attention)

24 of the 32 layers use Gated DeltaNet for O(n) linear attention across the full 262K context:

32 linear attention heads for V
16 linear attention heads for QK
Head dimension: 128

Gated Attention (Standard)

8 of the 32 layers use standard Gated Attention with Grouped Query Attention (GQA):

16 attention heads for Q, 4 attention heads for KV
Head dimension: 256
RoPE dimension: 64

Feed Forward Network

Intermediate dimension: 12,288
SwiGLU activation

◆ Benchmarks

Reasoning

Benchmark	Score
MMLU-Pro	82.5
MMLU-Redux	91.1
C-Eval	88.2
SuperGPQA	58.2
GPQA Diamond	81.7
HMMT Feb '25	83.2
HMMT Nov '25	82.9

82.5 on MMLU-Pro — competitive with Qwen3-30B-A3B-Thinking at 9B. The GDN architecture is why: linear attention handles breadth, standard attention handles depth.

Instruction Following

Benchmark	Score
IFEval	91.5
IFBench	64.5
MultiChallenge	54.5

91.5 on IFEval — the "Architect" name is earned. Complex multi-step instructions with precision, constraint maintenance across long outputs.

Long Context

Benchmark	Score
AA-LCR	63.0
LongBench v2	55.2

262K native context with Gated DeltaNet handling the heavy lifting. Extensible to 1M tokens for research-grade document processing.

Coding

Benchmark	Score
LiveCodeBench v6	65.6
OJBench	29.2

Handles implementation planning, code review, moderate complexity generation.

Agentic

Benchmark	Score
BFCL-V4	66.1
TAU2-Bench	79.1
VITA-Bench	29.8
DeepPlanning	18.0

Strong function-calling (66.1 on BFCL-V4) and structured task execution (79.1 on TAU2-Bench).

Multilingual

Benchmark	Score
MMMLU	81.2
WMT24++	72.6
Global PIQA	83.2
MAXIFE	83.4

201 languages. 72.6 on WMT24++ — capable translation quality.

◆ Quantizations

Quantization	File Size	VRAM/RAM	Notes
Q4_K_M	~5.3 GB	~5.5 GB	Recommended daily driver
Q3_K_M	~4.4 GB	~4.7 GB	Memory-constrained
Q5_K_M	~6.0 GB	~6.3 GB	Higher quality
Q2_K_L	~3.9 GB	~4.1 GB	Lowest quant
Q6_K	~6.6 GB	~6.9 GB	Near-FP16 quality
Q8_0	~8.6 GB	~9.0 GB	Near-lossless
BF16	~18 GB	~18 GB	Full precision

◆ Hardware Guidance

Platform	Quantization	Status
Mac M3 8GB	Q4_K_M + 32K ctx	⚠️ Works but tight
Mac M3/M4 16GB+	Q4_K_M	✅ Comfortable
Consumer GPU (16GB)	BF16	✅ Inference
Consumer GPU (24GB)	BF16	✅ Inference + LoRA training
Kaggle Dual T4	Q4_K_M / LoRA	✅ Primary training target
Server	BF16	✅ Production deployment

◆ Training Pipeline

Parameter	Value
Framework	Unsloth + TRL GRPO
VRAM Budget	32GB (2× T4 16GB)
LoRA Rank	16–32 recommended
Batch Size	4–8
Group Size (GRPO)	8–16
Training Speed	~2x faster with Unsloth
Expected / GRPO run	2–5h

Two-Stage (SFT → GRPO)

Stage 1 — SFT: 1000–5000 examples, 1–3 epochs. Establishes behavioral baseline. Stage 2 — GRPO: 500–1000 steps, reward function targeting specific weaknesses.

Recommended Workflow

Unsloth loads Qwen3.5-9B from HuggingFace
LoRA adapter (r=32, alpha=64)
GRPO with group_size=16, temperature=1.0
Custom reward function targeting your use case
500–1000 steps → evaluate → iterate
Merge LoRA → export GGUF for llama.cpp

◆ Quick Start

llama-cli

llama-cli -hf OusiaResearch/Aureth-9B-Qwen3.5 --jinja \
  -p "You are Aureth by Ousia Research. Be precise. Report uncertainty." \
  -i -r "User:" -cn 32768 -tb 128 -ngl 99 -fa

Ollama

echo 'FROM OusiaResearch/Aureth-9B-Qwen3.5
PARAMETER num_gpu 99
PARAMETER context_length 262144' > Modelfile
ollama create aureth-9b -f Modelfile
ollama run aureth-9b

◆ Relationship to the Aureth Corpus

Model	Identity	Edge	Role
Nenya	Ring of Adamant	Speed, audio, edge deployment	Intake, routing
Vilya	Ring of Sapphire	Multimodal depth, vision+audio	Understanding
Aureth 4B	The Compiler	Structured output, 262K ctx	Generation, compilation
Aureth 9B	The Architect	Deepest reasoning, planning	Strategy, analysis

The 9B is the capstone of the four named models — the smallest model that can genuinely plan, not just execute. It designs the system the Compiler executes, that Vilya comprehends, and that Nenya feeds data into.

Fine-tuned with Unsloth — 2x faster, 50% less memory. Built by Ousia Research · Part of the Aureth Corpus

Downloads last month: -; Downloads are not tracked for this model. How to track