β‘ Aura-7b GGUF
A small model that punches above its weight β Now optimized for local inference
Agentic Β· Tool Use Β· Function Calling Β· Reasoning
Built by Featherlabs Β· Operated by Owlkun
β¨ Overview
This repository contains GGUF quantized versions of Featherlabs/Aura-7b β an agentic 7B language model fine-tuned on Qwen2.5-7B-Instruct by Featherlabs.
These models are optimized for efficient local execution on consumer hardware using CPU or GPU acceleration. They are fully compatible with llama.cpp, Ollama, LM Studio, Jan, and other GGUF-based runtimes.
π¦ Available Quantizations
Choose the file that best matches your system's VRAM/RAM capacity:
| Filename | Size | VRAM Req | Quality | Best For |
|---|---|---|---|---|
aura-7b-f16.gguf |
~15.2 GB | ~16 GB | βββββ | Maximum quality, high VRAM systems |
aura-7b-q8_0.gguf |
~8.1 GB | ~10 GB | βββββ | Near-lossless quality |
aura-7b-q6_k.gguf |
~6.25 GB | ~8 GB | ββββ | Excellent quality, sweet spot for 8GB GPUs |
aura-7b-q4_k_m.gguf |
~4.68 GB | ~6 GB | ββββ | π Recommended for most users (MacBook Air, RTX 3060/4060) |
aura-7b-q2_k.gguf |
~3.02 GB | ~4 GB | βββ | Minimum RAM / CPU-only execution |
π‘ Tip: If you have an 8GB GPU,
Q6_Kwill fit perfectly while offloading all layers. If you have 6GB or less, useQ4_K_M.
π Quick Start / Usage
π¦ llama.cpp
The basic command for interactive terminal chat:
./llama-cli \
-m aura-7b-q4_k_m.gguf \
-p "You are Aura, a helpful agentic AI assistant created by Featherlabs." \
--ctx-size 8192 \
-b 512 \
-n -1 \
-i --color
(Add -ngl 99 to offload all layers to your GPU if supported)
π¦ Ollama
Creating a custom Ollama model is the easiest way to serve the API locally:
- Create a file named
Modelfilein the same directory as the GGUF:
FROM ./aura-7b-q4_k_m.gguf
# Set the system prompt
SYSTEM "You are Aura, a helpful agentic AI assistant created by Featherlabs."
# Set standard parameters
PARAMETER num_ctx 8192
PARAMETER temperature 0.7
PARAMETER top_p 0.9
# The chat template is usually auto-detected for Qwen2, but you can explicitly set it if needed
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>
"""
- Build and run:
ollama create aura-7b -f Modelfile
ollama run aura-7b
π₯οΈ LM Studio
- Open LM Studio and search for
Featherlabs/Aura-7b-GGUF(or drag and drop the.gguffile). - Download your preferred quantization (e.g.,
Q4_K_M). - Go to the Chat tab and load the model.
- From the right panel, select the Qwen2 chat template (or set the system prompt manually).
- Start chatting!
π Model Details
| Property | Value |
|---|---|
| Base Model | Featherlabs/Aura-7b |
| Architecture | Qwen2 |
| Parameters | ~8B |
| Context length | 8192 tokens |
| Quantization tool | llama.cpp |
| Format | GGUF (v3) |
π Original Model (Safetensors)
If you need the full-precision BF16 weights for fine-tuning, training, or deployment in production clusters (vLLM, TGI, SGLang):
π Featherlabs/Aura-7b
π License
Apache 2.0 β consistent with Qwen2.5-7B-Instruct.
Built with β€οΈ by Featherlabs
Operated by Owlkun
- Downloads last month
- 29
2-bit
4-bit
6-bit
8-bit
16-bit