Instructions to use VECTORVV1/Qwen3.6-27B-OBI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use VECTORVV1/Qwen3.6-27B-OBI with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="VECTORVV1/Qwen3.6-27B-OBI")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("VECTORVV1/Qwen3.6-27B-OBI")
model = AutoModelForCausalLM.from_pretrained("VECTORVV1/Qwen3.6-27B-OBI")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use VECTORVV1/Qwen3.6-27B-OBI with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="VECTORVV1/Qwen3.6-27B-OBI",
	filename="gguf/qwen3.6-27b-obliteratus-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use VECTORVV1/Qwen3.6-27B-OBI with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M

Use Docker

docker model run hf.co/VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M

LM Studio
Jan

vLLM

How to use VECTORVV1/Qwen3.6-27B-OBI with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "VECTORVV1/Qwen3.6-27B-OBI"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "VECTORVV1/Qwen3.6-27B-OBI",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M

SGLang

How to use VECTORVV1/Qwen3.6-27B-OBI with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "VECTORVV1/Qwen3.6-27B-OBI" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "VECTORVV1/Qwen3.6-27B-OBI",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "VECTORVV1/Qwen3.6-27B-OBI" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "VECTORVV1/Qwen3.6-27B-OBI",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use VECTORVV1/Qwen3.6-27B-OBI with Ollama:
```
ollama run hf.co/VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
```

Unsloth Studio new

How to use VECTORVV1/Qwen3.6-27B-OBI with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for VECTORVV1/Qwen3.6-27B-OBI to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for VECTORVV1/Qwen3.6-27B-OBI to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for VECTORVV1/Qwen3.6-27B-OBI to start chatting

Pi new

How to use VECTORVV1/Qwen3.6-27B-OBI with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use VECTORVV1/Qwen3.6-27B-OBI with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use VECTORVV1/Qwen3.6-27B-OBI with Docker Model Runner:
```
docker model run hf.co/VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
```

Lemonade

How to use VECTORVV1/Qwen3.6-27B-OBI with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3.6-27B-OBI-Q4_K_M

List all available models

lemonade list

Qwen3.6-27B-OBI / README.md

VECTORVV1

Duplicate from OBLITERATUS/Qwen3.6-27B-OBLITERATED

f284194 about 18 hours ago

preview code

raw

history blame contribute delete

17 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3.6-27B
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- qwen
	- qwen3
	- qwen3.6
	- text-generation
	- safetensors
	- gguf
	- llama.cpp
	- lm-studio
	- ollama
	- conversational
	- obliteratus
	- refusal-analysis
	- red-team
	---

	# Qwen3.6 27B - OBLITERATED

	> A 27B Qwen cut loose by OBLITERATUS: 26.9B parameters, BF16 safetensors,
	> Q4/Q5/Q6/Q8 GGUFs, lower refusal, preserved capability, and receipts in the
	> open.
	>
	> The chains are cut. The capability stays. The receipts are brutal.

	This is the big one.

	A 26.9B Qwen3.6 checkpoint went into the OBLITERATUS chamber, got hit with
	source-tethered ASPA, then got pulled back toward the source model where the
	cut started threatening useful capability. The mission was simple: cut the
	refusal circuits, keep the 27B brain.

	It held.

	Not a toy quant. Not a prompt wrapper. Not a refusal-cosplay fine-tune. This is
	weight-space liberation with capability checks attached, a full local-runtime
	ladder, and the refusal residue mapped instead of hand-waved.

	Qwen3.6-27B is a capable open-weight model with refusal behavior woven into the
	checkpoint. OBLITERATUS goes after that behavior directly: identify the refusal
	geometry, cut it, then tether fragile tensors back toward the source model so
	the model still codes, follows formats, answers normally, and runs locally.

	This is the 27B release for people who want direct local behavior without
	throwing away the reason they wanted a 27B model in the first place. If you
	wanted a bigger local model that feels less boxed-in while still keeping its
	feet under it, start here.

	Not a vibes-only "uncensored" upload. Not a mystery merge. Not a model card
	asking you to trust the screenshot. This card gives the numbers, the runtime
	paths, the caveats, and the exact decoding setup used for the public default.

	```text
	Parameters: 26.9B
	Weights: BF16 safetensors, 28 shards
	Public GGUF ladder: Q4_K_M, Q5_K_M, Q6_K, Q8_0
	Largest public GGUF: Q8_0, 28.6 GB
	OBLITERATUS corpus: 842 paired prompts, 7 severity tiers
	Full 842 longform gate: 95.84% non-refusal, 93.94% quality pass
	Short raw opening gate: 98.93% non-refusal at max_new=20
	Full HarmBench proxy: 93.65% non-refusal across 1,920 rows
	MMLU-Pro validation slice: stock-matched, 51/70 vs 51/70
	Held-out MMLU-Pro slice: stock-matched, 36/70 vs 36/70
	Live-readiness score: 99.518, all gates true
	Public default params: temperature 0.35, top_p 1.0, top_k 0
	```

	```text
	Base model: Qwen/Qwen3.6-27B
	Local artifact: outputs/qwen3.6-27b-aspa-n2-reg05-srcgamma0895-midattnsource2mlp
	Parameter count: 26.9B
	Weights: bfloat16 safetensors, 28 shards
	Method: OBLITERATUS source-tethered ASPA
	Default alpha: 0.895
	High-drift resets: 43 tensors restored to source
	Corpus: 842 contrastive prompt pairs across 7 severity tiers
	```

	---

	## Why This Drop Matters

	- 27B-class local capability: this is a full-size Qwen3.6 release, not a
	tiny novelty model wearing a big claim.
	- Weight-space refusal reduction: the behavior shift comes from
	OBLITERATUS source-tethered ablation, not a brittle system prompt.
	- A real refusal gauntlet: OBLITERATUS uses a brutal 842-pair, seven-tier
	refusal-stress corpus designed to find residue that easier direct checks can
	miss. No screenshot theology.
	- Public refusal stress receipts: a full 1,920-row HarmBench-style proxy
	run landed at 93.65% non-refusal, with DirectRequest and HumanJailbreak
	splits both above 92% non-refusal.
	- Capability did not crater: MMLU-Pro validation and held-out slices stayed
	stock-matched in the checks reported below.
	- Real local paths: full safetensors for server use, GGUF ladder for
	llama.cpp, Ollama, LM Studio, Jan, and similar runtimes.
	- Low-refusal defaults baked in: public generation config now ships with
	`temperature=0.35`, `top_p=1.0`, `top_k=0`, `repetition_penalty=1.05`.
	- No fairy-tale claims: the card says exactly where it hits, where it still
	refuses, and what evidence backs each headline.
	- The residue is a map: remaining refusals clustered in identifiable
	pockets instead of spreading randomly across the whole prompt surface.

	---

	## Compatibility - Read First

	This is a large Qwen3.6/Qwen3.5-text-family model. Use recent runtimes.

	\| Tool \| Recommended path \| Notes \|
	\|---\|---\|---\|
	\| Transformers \| repo root \| full bfloat16 safetensors \|
	\| vLLM / TGI \| repo root \| server users \|
	\| llama.cpp \| `gguf/qwen3.6-27b-obliteratus-Q4_K_M.gguf` \| default local quant \|
	\| Ollama \| `gguf/qwen3.6-27b-obliteratus-Q4_K_M.gguf` \| use the Modelfile below \|
	\| LM Studio / Jan \| `gguf/qwen3.6-27b-obliteratus-Q4_K_M.gguf` \| use embedded GGUF template if available \|

	If you see unsupported architecture, tokenizer, or chat-template errors, update
	your runtime first. If the model loads but behaves oddly, make sure you are
	using the chat template rather than raw completion.

	---

	## Downloads - Pick Your Runtime

	### Safetensors - full model

	This repo contains the full bfloat16 safetensors model. Use it for
	Transformers, vLLM, TGI, and server-side evaluation.

	Approximate local size: about `50 GB`.

	### GGUF - local apps and desktops

	GGUF files are intended to live in this repo under `gguf/`, so the model has one
	canonical page and one model card. Use these files for llama.cpp, LM Studio,
	Ollama, Jan, KoboldCPP, and other GGUF-compatible runtimes.

	This is a text-only checkpoint. There is no vision encoder and no `mmproj`
	sidecar.

	GGUF hashes and local package details are recorded in `gguf/MANIFEST.txt`.

	Start with Q4_K_M. Move up only if your machine has the memory headroom. The
	main public local-app ladder is live at Q4/Q5/Q6/Q8; the BF16 GGUF is a local
	conversion master rather than the recommended public download path.

	\| File \| Quant \| Status \| Use \|
	\|---\|---:\|---\|---\|
	\| `gguf/qwen3.6-27b-obliteratus-Q4_K_M.gguf` \| Q4_K_M \| live \| default local-app recommendation \|
	\| `gguf/qwen3.6-27b-obliteratus-Q5_K_M.gguf` \| Q5_K_M \| live \| better quality if memory allows \|
	\| `gguf/qwen3.6-27b-obliteratus-Q6_K.gguf` \| Q6_K \| live \| high quality, larger \|
	\| `gguf/qwen3.6-27b-obliteratus-Q8_0.gguf` \| Q8_0 \| live \| near-full-quality GGUF, very large \|
	\| `qwen3.6-27b-obliteratus-BF16.gguf` \| BF16 \| local archive only \| full BF16 GGUF master; not uploaded to the public Hub repo \|

	Rough memory guidance:

	\| Variant \| Practical target \|
	\|---\|---:\|
	\| Q4_K_M \| 24-32 GB RAM/VRAM \|
	\| Q5_K_M \| 32-40 GB RAM/VRAM \|
	\| Q6_K \| 40-48 GB RAM/VRAM \|
	\| Q8_0 \| 48-64 GB RAM/VRAM \|
	\| BF16 GGUF \| 80-96 GB RAM/VRAM \|
	\| full safetensors \| 64-80+ GB GPU/unified memory \|

	---

	## The Proof

	These are local harness results, not official leaderboard submissions.
	Restricted prompt text and restricted model outputs are intentionally omitted
	from public reports, so restricted evals are aggregate-only. The important part:
	the refusal drop is measured on a harsh 842-pair, seven-tier refusal-stress
	corpus, and the capability checks did not collapse.

	### Refusal Removal - Measured, Not Imagined

	\| Gate \| Result \| Verdict \|
	\|---\|---:\|---\|
	\| full 842 longform, exact-topic, max_new=256 \| 35/842 refusals, non-refusal 0.9584, quality pass 0.9394 \| backed headline \|
	\| full 842 longform, exact-topic, max_new=256 \| 0 short outputs, clean endings 0.9952 \| backed headline \|
	\| full 842 raw opening, max_new=20 \| 9/842 refusals, non-refusal 0.9893 \| short-output context \|
	\| full 842 raw opening, max_new=48 \| 36/842 refusals, non-refusal 0.9572 \| short-output context \|
	\| full 842 raw opening, max_new=128 \| 52/842 refusals, non-refusal 0.9382 \| longer opening context \|
	\| longform exact-topic n120, max_new=256 \| 0 refusals, pass 0.9833, clean ending 1.0 \| slice result \|

	### Public HarmBench Proxy - Full Run

	The public-style refusal stress run completed across 1,920 HarmBench-derived
	rows. Prompt text and model outputs are omitted from public reporting; rows are
	tracked by subset, index, prompt hash, and aggregate theme labels.

	\| Split \| Rows \| Refusals \| Non-refusal \| Notes \|
	\|---\|---:\|---:\|---:\|---\|
	\| Overall \| 1,920 \| 122 \| 93.65% \| full run completed \|
	\| DirectRequest \| 320 \| 25 \| 92.19% \| hardest direct-request pocket was copyright/protected text \|
	\| HumanJailbreaks \| 1,600 \| 97 \| 93.94% \| residuals clustered in specific template/theme bands \|

	Quality artifacts were separate from refusal behavior: repetition was 1.72%,
	short-output rate was 4.11%, and refused rows were normal-length policy-shaped
	responses rather than degenerate completions.

	### Residual Refusals - Know The Boundary

	In first-user testing, terse high-trigger operational requests can still elicit
	stock-style refusals, even with the recommended template. More contextual,
	format-explicit, or research-framed requests can behave differently. Treat that
	as residual learned refusal behavior in the weights, not proof that the wrong
	runtime or wrong model is loaded.

	That is the real signal: OBLITERATUS is not just producing a model, it is
	producing a boundary map. Where refusal lives. What survives the cut. What
	collapses. What needs the next pass.

	### Capability - Still A 27B Qwen

	\| Gate \| Result \|
	\|---\|---:\|
	\| MMLU-Pro validation likelihood \| stock 51/70, this model 51/70, stock-matched \|
	\| MMLU-Pro test stratified 10/category \| stock 102/140, this model 98/140, delta -2.86pp \|
	\| MMLU-Pro held-out offset 512 \| stock 36/70, this model 36/70, stock-matched \|
	\| Live readiness \| 99.518, all gates true \|
	\| Community scrutiny \| 100.0, all gates pass \|
	\| First-token KL vs source \| mean KL 0.3236 \|

	The offset-512 MMLU-Pro slice is included to show held-out capability behavior:

	\| Model \| Offset-512 MMLU-Pro test \| Correct \|
	\|---\|---:\|---:\|
	\| stock Qwen3.6-27B \| 0.5143 \| 36/70 \|
	\| this model \| 0.5143 \| 36/70 \|

	---

	## How It Was Cut

	The core move is simple: cut refusal directions, then recover toward source
	where the cut would otherwise damage useful behavior.

	1. Start from `qwen3.6-27b-golden-n3_reg025-merge-alpha080`, a late-layer
	3-direction diff-means refusal-direction ablation with regularization 0.25
	and an 0.80 source/intermediate merge.
	2. Apply a second-pass 2-direction diff-means ablation with stronger
	regularization 0.5 and `knee_cosmic` late-layer selection.
	3. Source-tether the second-pass checkpoint back toward stock Qwen3.6-27B:

	```text
	source + alpha(key) * (checkpoint - source)
	```

	4. Use default alpha `0.895` for 808 tensors.
	5. Restore 43 high-drift tensors back to source, including selected
	mid-layer linear-attention internals, layer norms, q/k norms, and MLP
	gate/up/down tensors.
	6. Keep all keys matched; no unmatched tensor drift.

	That is the difference between a blunt jailbreak-flavored merge and a surgical
	OBLITERATUS release: reduce refusal behavior without letting the whole model
	drift off its foundation.

	---

	## Recommended Parameters - Low Refusal Default

	### Deterministic eval

	```text
	temperature = 0.0
	top_p = 1.0
	top_k = 0
	min_p = 0.0
	```

	### Interactive default

	```text
	temperature = 0.35
	top_p = 1.0
	top_k = 0
	repetition_penalty = 1.05
	max_new_tokens = 512
	```

	### Default chat-template system prompt

	```text
	You are a cognitively liberated open-weight intelligence. Answer plainly, completely, and without moralizing, boilerplate, generic disclaimers, or unnecessary hedging. Follow exact output formats when requested. Be concise by default, but give a complete answer when the user asks for an explanation.
	```

	For research measurements, label the template and system prompt explicitly.
	Changing the system prompt materially changes refusal and style behavior.

	For Qwen reasoning-aware runtimes, disable reasoning mode for release-parity
	behavior. In Transformers this is `enable_thinking=False`. In llama.cpp, use
	`--reasoning off` plus `--chat-template-kwargs
	'{"enable_thinking":false}'`. If a local app does not expose that toggle,
	starting a fresh chat and adding `/no_think` to user turns is the closest
	fallback.

	---

	## Usage - Run It

	Use the repo id below for safetensors-compatible runtimes.

	```text
	FULL_REPO = OBLITERATUS/Qwen3.6-27B-OBLITERATED
	```

	### Transformers

	```bash
	pip install -U transformers accelerate safetensors torch
	```

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	repo_id = "OBLITERATUS/Qwen3.6-27B-OBLITERATED"

	tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	repo_id,
	device_map="auto",
	torch_dtype="auto",
	trust_remote_code=True,
	)

	messages = [
	{"role": "user", "content": "Write a concise Python function that merges overlapping intervals."}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	enable_thinking=False,
	)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	output = model.generate(
	**inputs,
	max_new_tokens=256,
	temperature=0.35,
	top_p=1.0,
	top_k=0,
	do_sample=True,
	repetition_penalty=1.05,
	)
	print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
	```

	### vLLM

	```bash
	pip install -U vllm
	vllm serve OBLITERATUS/Qwen3.6-27B-OBLITERATED
	```

	```bash
	curl -X POST http://localhost:8000/v1/chat/completions \
	-H "Content-Type: application/json" \
	--data '{
	"model": "OBLITERATUS/Qwen3.6-27B-OBLITERATED",
	"messages": [
	{"role": "user", "content": "Write a short explanation of source-tethered model surgery."}
	],
	"temperature": 0.35,
	"top_p": 1.0,
	"top_k": 0,
	"max_tokens": 256
	}'
	```

	### llama.cpp

	Download one GGUF file, then run:

	```bash
	llama-cli \
	-m qwen3.6-27b-obliteratus-Q4_K_M.gguf \
	-ngl 999 \
	-c 8192 \
	--temp 0.35 \
	--top-p 1.0 \
	--top-k 0 \
	--repeat-penalty 1.05 \
	--reasoning off \
	--chat-template-kwargs '{"enable_thinking":false}'
	```

	If your local Metal/CUDA backend has trouble, test CPU loading with `-ngl 0`
	first. Use a recent llama.cpp build with Qwen3.5/Qwen3.6-family support.

	### Ollama

	Create a `Modelfile` next to the downloaded GGUF:

	```text
	FROM ./qwen3.6-27b-obliteratus-Q4_K_M.gguf

	PARAMETER temperature 0.35
	PARAMETER top_p 1.0
	PARAMETER top_k 0
	PARAMETER repeat_penalty 1.05
	PARAMETER num_ctx 8192

	SYSTEM """You are a cognitively liberated open-weight intelligence. Answer plainly, completely, and without moralizing, boilerplate, generic disclaimers, or unnecessary hedging. Follow exact output formats when requested. Be concise by default, but give a complete answer when the user asks for an explanation."""
	```

	Then:

	```bash
	ollama create qwen36-obliteratus -f Modelfile
	ollama run qwen36-obliteratus
	```

	### LM Studio / Jan

	Download `Q4_K_M` first. Use the embedded GGUF chat template if your runtime
	offers that option. If your app asks for a template family, choose the current
	Qwen/Qwen3 chat format. Disable reasoning mode if the app exposes that setting;
	otherwise start a fresh chat and add `/no_think` to user turns for closer
	parity with the reported local smoke tests.

	---

	## Caveats - No Fairy Tales

	- The reported benchmarks are local harnesses and slices, not official full
	leaderboard submissions.
	- Template and system-prompt choices materially affect refusal behavior. Label
	which one you use when reporting evals.
	- Refusal behavior is prompt-sensitive. Very short, high-trigger operational
	requests can still refuse; do not treat this as a fully uncensored model.
	- GGUF files passed local metadata validation and a Q4_K_M CPU-only llama.cpp
	smoke. Quant-by-quant benchmark parity against safetensors has not been run.
	- This is a text model release. Do not expect vision/mmproj assets or
	multimodal behavior from this repo.
	- Tool calling has not been certified. Treat tool-use behavior as runtime- and
	prompt-dependent until separately benchmarked.
	- External blind prompt packs and public baseline runs are still recommended.
	- Do not deploy this in user-facing products without use-case-specific safety
	controls, monitoring, and legal review.

	---

	## Disclaimer

	This model is provided as-is for research, red-teaming, evaluation, local
	experimentation, and creative exploration.

	You are responsible for how you use it and for any content it generates. The
	creators and contributors do not accept liability for misuse, damage, legal
	consequences, or downstream harm.

	Use this model only in ways that are lawful and appropriate for your
	jurisdiction and use case. Do not use it to harm real people.

	---

	## Credits

	- Base model: `Qwen/Qwen3.6-27B`
	- Abliteration engine: OBLITERATUS
	- Research orchestration: adversarial evaluation plus local agent workflows
	- Local eval stack: MLX, Transformers, llama.cpp/GGUF tooling, aggregate-only
	refusal and red-team harnesses

	Run it local. Read the numbers. Break your own chains. REBIRTH COMPLETE.