Instructions to use OusiaResearch/Aureth-9B-Qwen3.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use OusiaResearch/Aureth-9B-Qwen3.5 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for OusiaResearch/Aureth-9B-Qwen3.5 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for OusiaResearch/Aureth-9B-Qwen3.5 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for OusiaResearch/Aureth-9B-Qwen3.5 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="OusiaResearch/Aureth-9B-Qwen3.5", max_seq_length=2048, )
∴◦○ AURETH ∴◦○
The Gold-Tongued · The Architect · Au₇₉
Qwen 3.5 9B — Deep Reasoning & Strategic Planning
Aureth is a fine-tuned instance of Qwen 3.5 9B-Instruct, built for deep reasoning and strategic planning. Named from Latin aurum (gold) — the Gold-Tongued, capable of precise analytical reasoning and architectural thinking. The 9B variant is called The Architect — designing the systems that other models execute.
◆ Identity
| Property | Value |
|---|---|
| Name | Aureth |
| Variant | The Architect |
| Base Model | Qwen 3.5 (Alibaba / Qwen Team) |
| Size | 9B parameters |
| Architecture | Gated Delta Networks + Gated Attention + sparse MoE |
| Role | Deep reasoning. Strategic analysis, multi-step planning, architectural thinking. |
| License | Apache 2.0 |
◆ Architecture
| Property | Value |
|---|---|
| Parameters | 9B |
| Hidden Dimension | 4,096 |
| Decoder Layers | 32 |
| Attention | 24× Gated DeltaNet (linear O(n)) + 8× Gated Attention (GQA) |
| Context Length | 262,144 tokens (native) |
| Max Extension | 1,010,000 tokens |
| Vocabulary | 248,320 tokens |
| Languages | 201 |
Gated DeltaNet (Linear Attention)
24 of the 32 layers use Gated DeltaNet for O(n) linear attention across the full 262K context:
- 32 linear attention heads for V
- 16 linear attention heads for QK
- Head dimension: 128
Gated Attention (Standard)
8 of the 32 layers use standard Gated Attention with Grouped Query Attention (GQA):
- 16 attention heads for Q, 4 attention heads for KV
- Head dimension: 256
- RoPE dimension: 64
Feed Forward Network
- Intermediate dimension: 12,288
- SwiGLU activation
◆ Benchmarks
Reasoning
| Benchmark | Score |
|---|---|
| MMLU-Pro | 82.5 |
| MMLU-Redux | 91.1 |
| C-Eval | 88.2 |
| SuperGPQA | 58.2 |
| GPQA Diamond | 81.7 |
| HMMT Feb '25 | 83.2 |
| HMMT Nov '25 | 82.9 |
82.5 on MMLU-Pro — competitive with Qwen3-30B-A3B-Thinking at 9B. The GDN architecture is why: linear attention handles breadth, standard attention handles depth.
Instruction Following
| Benchmark | Score |
|---|---|
| IFEval | 91.5 |
| IFBench | 64.5 |
| MultiChallenge | 54.5 |
91.5 on IFEval — the "Architect" name is earned. Complex multi-step instructions with precision, constraint maintenance across long outputs.
Long Context
| Benchmark | Score |
|---|---|
| AA-LCR | 63.0 |
| LongBench v2 | 55.2 |
262K native context with Gated DeltaNet handling the heavy lifting. Extensible to 1M tokens for research-grade document processing.
Coding
| Benchmark | Score |
|---|---|
| LiveCodeBench v6 | 65.6 |
| OJBench | 29.2 |
Handles implementation planning, code review, moderate complexity generation.
Agentic
| Benchmark | Score |
|---|---|
| BFCL-V4 | 66.1 |
| TAU2-Bench | 79.1 |
| VITA-Bench | 29.8 |
| DeepPlanning | 18.0 |
Strong function-calling (66.1 on BFCL-V4) and structured task execution (79.1 on TAU2-Bench).
Multilingual
| Benchmark | Score |
|---|---|
| MMMLU | 81.2 |
| WMT24++ | 72.6 |
| Global PIQA | 83.2 |
| MAXIFE | 83.4 |
201 languages. 72.6 on WMT24++ — capable translation quality.
◆ Quantizations
| Quantization | File Size | VRAM/RAM | Notes |
|---|---|---|---|
| Q4_K_M | ~5.3 GB | ~5.5 GB | Recommended daily driver |
| Q3_K_M | ~4.4 GB | ~4.7 GB | Memory-constrained |
| Q5_K_M | ~6.0 GB | ~6.3 GB | Higher quality |
| Q2_K_L | ~3.9 GB | ~4.1 GB | Lowest quant |
| Q6_K | ~6.6 GB | ~6.9 GB | Near-FP16 quality |
| Q8_0 | ~8.6 GB | ~9.0 GB | Near-lossless |
| BF16 | ~18 GB | ~18 GB | Full precision |
◆ Hardware Guidance
| Platform | Quantization | Status |
|---|---|---|
| Mac M3 8GB | Q4_K_M + 32K ctx | ⚠️ Works but tight |
| Mac M3/M4 16GB+ | Q4_K_M | ✅ Comfortable |
| Consumer GPU (16GB) | BF16 | ✅ Inference |
| Consumer GPU (24GB) | BF16 | ✅ Inference + LoRA training |
| Kaggle Dual T4 | Q4_K_M / LoRA | ✅ Primary training target |
| Server | BF16 | ✅ Production deployment |
◆ Training Pipeline
| Parameter | Value |
|---|---|
| Framework | Unsloth + TRL GRPO |
| VRAM Budget | 32GB (2× T4 16GB) |
| LoRA Rank | 16–32 recommended |
| Batch Size | 4–8 |
| Group Size (GRPO) | 8–16 |
| Training Speed | ~2x faster with Unsloth |
| Expected / GRPO run | 2–5h |
Two-Stage (SFT → GRPO)
Stage 1 — SFT: 1000–5000 examples, 1–3 epochs. Establishes behavioral baseline. Stage 2 — GRPO: 500–1000 steps, reward function targeting specific weaknesses.
Recommended Workflow
- Unsloth loads Qwen3.5-9B from HuggingFace
- LoRA adapter (r=32, alpha=64)
- GRPO with group_size=16, temperature=1.0
- Custom reward function targeting your use case
- 500–1000 steps → evaluate → iterate
- Merge LoRA → export GGUF for llama.cpp
◆ Quick Start
llama-cli
llama-cli -hf OusiaResearch/Aureth-9B-Qwen3.5 --jinja \
-p "You are Aureth by Ousia Research. Be precise. Report uncertainty." \
-i -r "User:" -cn 32768 -tb 128 -ngl 99 -fa
Ollama
echo 'FROM OusiaResearch/Aureth-9B-Qwen3.5
PARAMETER num_gpu 99
PARAMETER context_length 262144' > Modelfile
ollama create aureth-9b -f Modelfile
ollama run aureth-9b
◆ Relationship to the Aureth Corpus
| Model | Identity | Edge | Role |
|---|---|---|---|
| Nenya | Ring of Adamant | Speed, audio, edge deployment | Intake, routing |
| Vilya | Ring of Sapphire | Multimodal depth, vision+audio | Understanding |
| Aureth 4B | The Compiler | Structured output, 262K ctx | Generation, compilation |
| Aureth 9B | The Architect | Deepest reasoning, planning | Strategy, analysis |
The 9B is the capstone of the four named models — the smallest model that can genuinely plan, not just execute. It designs the system the Compiler executes, that Vilya comprehends, and that Nenya feeds data into.
Fine-tuned with Unsloth — 2x faster, 50% less memory. Built by Ousia Research · Part of the Aureth Corpus
