⚡ Aura-7b GGUF

A small model that punches above its weight — Now optimized for local inference

Agentic · Tool Use · Function Calling · Reasoning

Built by Featherlabs · Operated by Owlkun

✨ Overview

This repository contains GGUF quantized versions of Featherlabs/Aura-7b — an agentic 7B language model fine-tuned on Qwen2.5-7B-Instruct by Featherlabs.

These models are optimized for efficient local execution on consumer hardware using CPU or GPU acceleration. They are fully compatible with llama.cpp, Ollama, LM Studio, Jan, and other GGUF-based runtimes.

📦 Available Quantizations

Choose the file that best matches your system's VRAM/RAM capacity:

Filename	Size	VRAM Req	Quality	Best For
`aura-7b-f16.gguf`	~15.2 GB	~16 GB	⭐⭐⭐⭐⭐	Maximum quality, high VRAM systems
`aura-7b-q8_0.gguf`	~8.1 GB	~10 GB	⭐⭐⭐⭐⭐	Near-lossless quality
`aura-7b-q6_k.gguf`	~6.25 GB	~8 GB	⭐⭐⭐⭐	Excellent quality, sweet spot for 8GB GPUs
`aura-7b-q4_k_m.gguf`	~4.68 GB	~6 GB	⭐⭐⭐⭐	🏆 Recommended for most users (MacBook Air, RTX 3060/4060)
`aura-7b-q2_k.gguf`	~3.02 GB	~4 GB	⭐⭐⭐	Minimum RAM / CPU-only execution

💡 Tip: If you have an 8GB GPU, Q6_K will fit perfectly while offloading all layers. If you have 6GB or less, use Q4_K_M.

🚀 Quick Start / Usage

🦙 llama.cpp

The basic command for interactive terminal chat:

./llama-cli \
  -m aura-7b-q4_k_m.gguf \
  -p "You are Aura, a helpful agentic AI assistant created by Featherlabs." \
  --ctx-size 8192 \
  -b 512 \
  -n -1 \
  -i --color

(Add -ngl 99 to offload all layers to your GPU if supported)

🦙 Ollama

Creating a custom Ollama model is the easiest way to serve the API locally:

Create a file named Modelfile in the same directory as the GGUF:

FROM ./aura-7b-q4_k_m.gguf

# Set the system prompt
SYSTEM "You are Aura, a helpful agentic AI assistant created by Featherlabs."

# Set standard parameters
PARAMETER num_ctx 8192
PARAMETER temperature 0.7
PARAMETER top_p 0.9

# The chat template is usually auto-detected for Qwen2, but you can explicitly set it if needed
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>
"""

Build and run:

ollama create aura-7b -f Modelfile
ollama run aura-7b

🖥️ LM Studio

Open LM Studio and search for Featherlabs/Aura-7b-GGUF (or drag and drop the .gguf file).
Download your preferred quantization (e.g., Q4_K_M).
Go to the Chat tab and load the model.
From the right panel, select the Qwen2 chat template (or set the system prompt manually).
Start chatting!

📊 Model Details

Property	Value
Base Model	Featherlabs/Aura-7b
Architecture	Qwen2
Parameters	~8B
Context length	8192 tokens
Quantization tool	`llama.cpp`
Format	GGUF (v3)

👑 Original Model (Safetensors)

If you need the full-precision BF16 weights for fine-tuning, training, or deployment in production clusters (vLLM, TGI, SGLang):

👉 Featherlabs/Aura-7b

📜 License

Apache 2.0 — consistent with Qwen2.5-7B-Instruct.

Built with ❤️ by Featherlabs

Operated by Owlkun

Downloads last month: 29

GGUF

Model size

8B params

Architecture

qwen2

Hardware compatibility

2-bit

4-bit

6-bit

8-bit

16-bit

Model tree for Featherlabs/Aura-7b-GGUF

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

Featherlabs/Aura-7b

Quantized

(3)

this model