How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="GestaltLabs/Ornstein-3.6-27B-GGUF",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Ornstein-3.6-27B

Ornstein-3.6-27B-GGUF

GGUF quantizations of GestaltLabs/Ornstein-3.6-27B — a Qwen 3.6 27B dense multimodal fine-tune with hybrid linear + full attention.

Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

Support on Ko-fi


Model info

  • Architecture: Qwen3_5ForConditionalGeneration (linear + full attention interleaved, Gated Delta Net; text path extracted for GGUF)
  • Parameters: ~27 B dense
  • Context: 262,144 tokens
  • Hidden size / layers: 5120 / 64
  • Attention: 24 heads, 4 KV heads, head_dim 256

These GGUFs expose the text path only. For the multimodal variant use the full safetensors in the base repo.

Quant index

Choose a quant that fits in your RAM/VRAM with room for context. For a dense 27B prefer Q4_K_M or higher on 24 GB cards; go to Q5_K_M or Q6_K if you have headroom.

File Bits Notes
Ornstein-3.6-27B-Q8_0.gguf 8 Reference, near-lossless
Ornstein-3.6-27B-Q6_K.gguf 6.5 Great default for 32 GB+ systems
Ornstein-3.6-27B-Q5_K_M.gguf 5.5 Excellent quality/size balance
Ornstein-3.6-27B-Q5_K_S.gguf 5.5 Slightly smaller Q5
Ornstein-3.6-27B-Q5_0.gguf 5 Legacy 5-bit
Ornstein-3.6-27B-Q4_K_M.gguf 4.5 Common 24 GB-card default
Ornstein-3.6-27B-Q4_K_S.gguf 4.5 Smaller Q4
Ornstein-3.6-27B-Q4_0.gguf 4 Legacy 4-bit
Ornstein-3.6-27B-IQ4_NL.gguf 4.25 Non-linear 4-bit I-quant
Ornstein-3.6-27B-IQ4_XS.gguf 4.25 Smaller than Q4_K_S, comparable quality
Ornstein-3.6-27B-Q3_K_L.gguf 3.5 Largest Q3
Ornstein-3.6-27B-Q3_K_M.gguf 3.5 Usable; quality below Q4
Ornstein-3.6-27B-Q3_K_S.gguf 3.5 Smaller Q3
Ornstein-3.6-27B-IQ3_M.gguf 3.3 Mixed I-quant, beats Q3_K_S at similar size
Ornstein-3.6-27B-IQ3_S.gguf 3.1 3-bit I-quant
Ornstein-3.6-27B-IQ3_XS.gguf 3.0 Smaller 3-bit I-quant
Ornstein-3.6-27B-IQ3_XXS.gguf 3.0 Aggressive 3-bit
Ornstein-3.6-27B-Q2_K.gguf 2.6 Lowest K-quant; expect degraded quality

BF16/F16 GGUF is not shipped here — if you want full precision, grab the safetensors from the base repo.

Usage

llama.cpp

# Interactive chat
llama-cli -m Ornstein-3.6-27B-Q4_K_M.gguf -cnv

# Single prompt
llama-cli -m Ornstein-3.6-27B-Q5_K_M.gguf -p "Write a haiku about hybrid attention."

# OpenAI-compatible server
llama-server -m Ornstein-3.6-27B-Q4_K_M.gguf --host 0.0.0.0 --port 8080 -c 8192

Other runners

LM Studio, Ollama (via a Modelfile), koboldcpp, and text-generation-webui all load these GGUFs provided their bundled llama.cpp supports Qwen3_5ForConditionalGeneration with Gated Delta Net.

Reproducing the quants

# 1. Convert safetensors → BF16 GGUF
python llama.cpp/convert_hf_to_gguf.py <model_dir> \
    --outtype bf16 --outfile Ornstein-3.6-27B-BF16.gguf

# 2. Quantize (example)
llama-quantize Ornstein-3.6-27B-BF16.gguf \
    Ornstein-3.6-27B-Q4_K_M.gguf Q4_K_M

License

Apache 2.0 — inherited from the Qwen 3.6 base release.

Downloads last month
4,088
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GestaltLabs/Ornstein-3.6-27B-GGUF

Base model

Qwen/Qwen3.6-27B
Quantized
(3)
this model