⚑ Aura-7b GGUF

A small model that punches above its weight β€” Now optimized for local inference

Agentic Β· Tool Use Β· Function Calling Β· Reasoning

License Base Model Quantization

Built by Featherlabs Β· Operated by Owlkun


✨ Overview

This repository contains GGUF quantized versions of Featherlabs/Aura-7b β€” an agentic 7B language model fine-tuned on Qwen2.5-7B-Instruct by Featherlabs.

These models are optimized for efficient local execution on consumer hardware using CPU or GPU acceleration. They are fully compatible with llama.cpp, Ollama, LM Studio, Jan, and other GGUF-based runtimes.


πŸ“¦ Available Quantizations

Choose the file that best matches your system's VRAM/RAM capacity:

Filename Size VRAM Req Quality Best For
aura-7b-f16.gguf ~15.2 GB ~16 GB ⭐⭐⭐⭐⭐ Maximum quality, high VRAM systems
aura-7b-q8_0.gguf ~8.1 GB ~10 GB ⭐⭐⭐⭐⭐ Near-lossless quality
aura-7b-q6_k.gguf ~6.25 GB ~8 GB ⭐⭐⭐⭐ Excellent quality, sweet spot for 8GB GPUs
aura-7b-q4_k_m.gguf ~4.68 GB ~6 GB ⭐⭐⭐⭐ πŸ† Recommended for most users (MacBook Air, RTX 3060/4060)
aura-7b-q2_k.gguf ~3.02 GB ~4 GB ⭐⭐⭐ Minimum RAM / CPU-only execution

πŸ’‘ Tip: If you have an 8GB GPU, Q6_K will fit perfectly while offloading all layers. If you have 6GB or less, use Q4_K_M.


πŸš€ Quick Start / Usage

πŸ¦™ llama.cpp

The basic command for interactive terminal chat:

./llama-cli \
  -m aura-7b-q4_k_m.gguf \
  -p "You are Aura, a helpful agentic AI assistant created by Featherlabs." \
  --ctx-size 8192 \
  -b 512 \
  -n -1 \
  -i --color

(Add -ngl 99 to offload all layers to your GPU if supported)

πŸ¦™ Ollama

Creating a custom Ollama model is the easiest way to serve the API locally:

  1. Create a file named Modelfile in the same directory as the GGUF:
FROM ./aura-7b-q4_k_m.gguf

# Set the system prompt
SYSTEM "You are Aura, a helpful agentic AI assistant created by Featherlabs."

# Set standard parameters
PARAMETER num_ctx 8192
PARAMETER temperature 0.7
PARAMETER top_p 0.9

# The chat template is usually auto-detected for Qwen2, but you can explicitly set it if needed
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>
"""
  1. Build and run:
ollama create aura-7b -f Modelfile
ollama run aura-7b

πŸ–₯️ LM Studio

  1. Open LM Studio and search for Featherlabs/Aura-7b-GGUF (or drag and drop the .gguf file).
  2. Download your preferred quantization (e.g., Q4_K_M).
  3. Go to the Chat tab and load the model.
  4. From the right panel, select the Qwen2 chat template (or set the system prompt manually).
  5. Start chatting!

πŸ“Š Model Details

Property Value
Base Model Featherlabs/Aura-7b
Architecture Qwen2
Parameters ~8B
Context length 8192 tokens
Quantization tool llama.cpp
Format GGUF (v3)

πŸ‘‘ Original Model (Safetensors)

If you need the full-precision BF16 weights for fine-tuning, training, or deployment in production clusters (vLLM, TGI, SGLang):

πŸ‘‰ Featherlabs/Aura-7b


πŸ“œ License

Apache 2.0 β€” consistent with Qwen2.5-7B-Instruct.


Built with ❀️ by Featherlabs

Operated by Owlkun

Downloads last month
29
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

2-bit

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Featherlabs/Aura-7b-GGUF

Base model

Qwen/Qwen2.5-7B
Quantized
(3)
this model