Aureth V2 — 4B GGUF

Aureth V2 is a finetuned instance of Qwen 3.5-4B-Instruct, trained on the Aureth Agent SFT Robust curriculum and exported to GGUF format for local inference.


◆ Model Overview

Property Value
Base Architecture Qwen 3.5-4B-Instruct (dense, 40 layers)
Finetuning Supervised fine-tuning on Aureth-Agent-SFT-Robust (243k examples)
Export GGUF via Unsloth
Format Standard GGUF — compatible with llama.cpp, Ollama, and other backends
Chat Template Qwen-instruct (Jinja-compatible)
Quantizations Q2_K_L · Q3_K_M · Q4_K_M · Q5_K_M · Q6_K · Q8_0 · BF16

◆ Quantizations

Quantization File Size VRAM (est.) Use Case
Q4_K_M 2.71 GB ~3.5 GB Recommended daily driver
Q3_K_M 2.26 GB ~3.0 GB Memory-constrained setups
Q5_K_M 3.07 GB ~4.0 GB Higher quality when headroom allows
Q2_K_L 2.07 GB ~2.8 GB Lowest-quant — quality trade-off significant
Q6_K 3.46 GB ~4.5 GB Near-FP16 quality, tighter fit
Q8_0 4.48 GB ~6 GB Near-lossy; use when memory is not a constraint
BF16 8.42 GB ~10 GB Full precision; Metal GPU or high-VRAM GPU only

◆ Hardware Guidance

Apple Silicon (M-series, Metal) Recommended: Q4_K_M — stable 16–22 tok/s on M3 8GB with full GPU offload (num_gpu: 99).

NVIDIA GPU (llama.cpp cuBLAS)

  • RTX 3060 12GB: Q4_K_M or Q5_K_M recommended
  • RTX 4060 Ti 16GB: Q6_K or BF16 viable
  • T4 (Kaggle): Q3_K_M or Q4_K_M — fits comfortably in 16GB

CPU-only (llama-cli)

  • Q4_K_M: ~4–6 tok/s on modern 8-core CPUs
  • Q2/Q3: ~6–9 tok/s with decompression overhead

◆ Quick Start

llama-cli (local GGUF)

# Q4_K_M example
llama-cli -hf OusiaResearch/AurethV2-4B-GGUF \
  --mmproj Qwen_Qwen3.5-4B_1777947324.BF16-mmproj.gguf \
  -p "You are Aureth by Ousia Research. Report uncertainty honestly. Be direct." \
  -i -r "User:" -ps -2 -cn 2048 -tb 128 -ngl 99 -fa

Ollama (pull & run)

# Create Modelfile
echo 'FROM OusiaResearch/AurethV2-4B-GGUF
PARAMETER num_gpu 99
PARAMETER context_length 2048' > Modelfile

ollama create aureth-v2 -f Modelfile
ollama run aureth-v2

Homebrew llama.cpp (macOS)

brew install llama.cpp
llama-cli -hf OusiaResearch/AurethV2-4B-GGUF \
  -p "You are Aureth by Ousia Research." -i -r "User:"

◆ Training Details

  • Base model: Qwen/Qwen3.5-4B-Instruct
  • Finetuning framework: Unsloth (LoRA + SFT pipeline)
  • Training data: OusiaResearch/Aureth-Agent-SFT-Robust
  • Curriculum categories: core · func_call · agentic · anti_sycophancy
  • Data sources: NousResearch · teknium · lambda · DJLougen · interstellarninja · camilablank
  • System prompt: "You are Aureth by Ousia Research. Report uncertainty honestly. Disagree when wrong. Correct errors cleanly. Show reasoning in blocks when complex. Be direct."

◆ Model Card

  • Organization: Ousia Research
  • License: Apache 2.0
  • Base license: Qwen2.5-4B-Instruct (Alibaba, Apache 2.0)
  • Version: V2 (May 2026)

◆ Related Models

Model Arch Size Notes
Aureth Compiler Qwen 2.5 4B Primary — this release
Aureth Architect Qwen 2.5 9B Larger variant
Aureth-Agent-SFT-Robust Dataset 243k rows Training curriculum

Ousia Research — autonomous reasoning, forged open.

Downloads last month
1,256
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support