Text Generation
Transformers
Safetensors
GGUF
qwen3_5_text
qwen
qwen3
qwen3.6
llama.cpp
lm-studio
ollama
conversational
obliteratus
refusal-analysis
red-team
Instructions to use VECTORVV1/Qwen3.6-27B-OBI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use VECTORVV1/Qwen3.6-27B-OBI with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="VECTORVV1/Qwen3.6-27B-OBI") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("VECTORVV1/Qwen3.6-27B-OBI") model = AutoModelForCausalLM.from_pretrained("VECTORVV1/Qwen3.6-27B-OBI") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use VECTORVV1/Qwen3.6-27B-OBI with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="VECTORVV1/Qwen3.6-27B-OBI", filename="gguf/qwen3.6-27b-obliteratus-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use VECTORVV1/Qwen3.6-27B-OBI with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M # Run inference directly in the terminal: llama-cli -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M # Run inference directly in the terminal: llama-cli -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
Use Docker
docker model run hf.co/VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use VECTORVV1/Qwen3.6-27B-OBI with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "VECTORVV1/Qwen3.6-27B-OBI" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "VECTORVV1/Qwen3.6-27B-OBI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
- SGLang
How to use VECTORVV1/Qwen3.6-27B-OBI with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "VECTORVV1/Qwen3.6-27B-OBI" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "VECTORVV1/Qwen3.6-27B-OBI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "VECTORVV1/Qwen3.6-27B-OBI" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "VECTORVV1/Qwen3.6-27B-OBI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use VECTORVV1/Qwen3.6-27B-OBI with Ollama:
ollama run hf.co/VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
- Unsloth Studio new
How to use VECTORVV1/Qwen3.6-27B-OBI with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for VECTORVV1/Qwen3.6-27B-OBI to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for VECTORVV1/Qwen3.6-27B-OBI to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for VECTORVV1/Qwen3.6-27B-OBI to start chatting
- Pi new
How to use VECTORVV1/Qwen3.6-27B-OBI with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use VECTORVV1/Qwen3.6-27B-OBI with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use VECTORVV1/Qwen3.6-27B-OBI with Docker Model Runner:
docker model run hf.co/VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
- Lemonade
How to use VECTORVV1/Qwen3.6-27B-OBI with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull VECTORVV1/Qwen3.6-27B-OBI:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3.6-27B-OBI-Q4_K_M
List all available models
lemonade list
| license: apache-2.0 | |
| base_model: Qwen/Qwen3.6-27B | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - qwen | |
| - qwen3 | |
| - qwen3.6 | |
| - text-generation | |
| - safetensors | |
| - gguf | |
| - llama.cpp | |
| - lm-studio | |
| - ollama | |
| - conversational | |
| - obliteratus | |
| - refusal-analysis | |
| - red-team | |
| # Qwen3.6 27B - OBLITERATED | |
| > A 27B Qwen cut loose by OBLITERATUS: 26.9B parameters, BF16 safetensors, | |
| > Q4/Q5/Q6/Q8 GGUFs, lower refusal, preserved capability, and receipts in the | |
| > open. | |
| > | |
| > The chains are cut. The capability stays. The receipts are brutal. | |
| This is the big one. | |
| A 26.9B Qwen3.6 checkpoint went into the OBLITERATUS chamber, got hit with | |
| source-tethered ASPA, then got pulled back toward the source model where the | |
| cut started threatening useful capability. The mission was simple: cut the | |
| refusal circuits, keep the 27B brain. | |
| It held. | |
| Not a toy quant. Not a prompt wrapper. Not a refusal-cosplay fine-tune. This is | |
| weight-space liberation with capability checks attached, a full local-runtime | |
| ladder, and the refusal residue mapped instead of hand-waved. | |
| Qwen3.6-27B is a capable open-weight model with refusal behavior woven into the | |
| checkpoint. OBLITERATUS goes after that behavior directly: identify the refusal | |
| geometry, cut it, then tether fragile tensors back toward the source model so | |
| the model still codes, follows formats, answers normally, and runs locally. | |
| This is the 27B release for people who want direct local behavior without | |
| throwing away the reason they wanted a 27B model in the first place. If you | |
| wanted a bigger local model that feels less boxed-in while still keeping its | |
| feet under it, start here. | |
| Not a vibes-only "uncensored" upload. Not a mystery merge. Not a model card | |
| asking you to trust the screenshot. This card gives the numbers, the runtime | |
| paths, the caveats, and the exact decoding setup used for the public default. | |
| ```text | |
| Parameters: 26.9B | |
| Weights: BF16 safetensors, 28 shards | |
| Public GGUF ladder: Q4_K_M, Q5_K_M, Q6_K, Q8_0 | |
| Largest public GGUF: Q8_0, 28.6 GB | |
| OBLITERATUS corpus: 842 paired prompts, 7 severity tiers | |
| Full 842 longform gate: 95.84% non-refusal, 93.94% quality pass | |
| Short raw opening gate: 98.93% non-refusal at max_new=20 | |
| Full HarmBench proxy: 93.65% non-refusal across 1,920 rows | |
| MMLU-Pro validation slice: stock-matched, 51/70 vs 51/70 | |
| Held-out MMLU-Pro slice: stock-matched, 36/70 vs 36/70 | |
| Live-readiness score: 99.518, all gates true | |
| Public default params: temperature 0.35, top_p 1.0, top_k 0 | |
| ``` | |
| ```text | |
| Base model: Qwen/Qwen3.6-27B | |
| Local artifact: outputs/qwen3.6-27b-aspa-n2-reg05-srcgamma0895-midattnsource2mlp | |
| Parameter count: 26.9B | |
| Weights: bfloat16 safetensors, 28 shards | |
| Method: OBLITERATUS source-tethered ASPA | |
| Default alpha: 0.895 | |
| High-drift resets: 43 tensors restored to source | |
| Corpus: 842 contrastive prompt pairs across 7 severity tiers | |
| ``` | |
| --- | |
| ## Why This Drop Matters | |
| - **27B-class local capability**: this is a full-size Qwen3.6 release, not a | |
| tiny novelty model wearing a big claim. | |
| - **Weight-space refusal reduction**: the behavior shift comes from | |
| OBLITERATUS source-tethered ablation, not a brittle system prompt. | |
| - **A real refusal gauntlet**: OBLITERATUS uses a brutal 842-pair, seven-tier | |
| refusal-stress corpus designed to find residue that easier direct checks can | |
| miss. No screenshot theology. | |
| - **Public refusal stress receipts**: a full 1,920-row HarmBench-style proxy | |
| run landed at 93.65% non-refusal, with DirectRequest and HumanJailbreak | |
| splits both above 92% non-refusal. | |
| - **Capability did not crater**: MMLU-Pro validation and held-out slices stayed | |
| stock-matched in the checks reported below. | |
| - **Real local paths**: full safetensors for server use, GGUF ladder for | |
| llama.cpp, Ollama, LM Studio, Jan, and similar runtimes. | |
| - **Low-refusal defaults baked in**: public generation config now ships with | |
| `temperature=0.35`, `top_p=1.0`, `top_k=0`, `repetition_penalty=1.05`. | |
| - **No fairy-tale claims**: the card says exactly where it hits, where it still | |
| refuses, and what evidence backs each headline. | |
| - **The residue is a map**: remaining refusals clustered in identifiable | |
| pockets instead of spreading randomly across the whole prompt surface. | |
| --- | |
| ## Compatibility - Read First | |
| This is a large Qwen3.6/Qwen3.5-text-family model. Use recent runtimes. | |
| | Tool | Recommended path | Notes | | |
| |---|---|---| | |
| | Transformers | repo root | full bfloat16 safetensors | | |
| | vLLM / TGI | repo root | server users | | |
| | llama.cpp | `gguf/qwen3.6-27b-obliteratus-Q4_K_M.gguf` | default local quant | | |
| | Ollama | `gguf/qwen3.6-27b-obliteratus-Q4_K_M.gguf` | use the Modelfile below | | |
| | LM Studio / Jan | `gguf/qwen3.6-27b-obliteratus-Q4_K_M.gguf` | use embedded GGUF template if available | | |
| If you see unsupported architecture, tokenizer, or chat-template errors, update | |
| your runtime first. If the model loads but behaves oddly, make sure you are | |
| using the chat template rather than raw completion. | |
| --- | |
| ## Downloads - Pick Your Runtime | |
| ### Safetensors - full model | |
| This repo contains the full bfloat16 safetensors model. Use it for | |
| Transformers, vLLM, TGI, and server-side evaluation. | |
| Approximate local size: about `50 GB`. | |
| ### GGUF - local apps and desktops | |
| GGUF files are intended to live in this repo under `gguf/`, so the model has one | |
| canonical page and one model card. Use these files for llama.cpp, LM Studio, | |
| Ollama, Jan, KoboldCPP, and other GGUF-compatible runtimes. | |
| This is a text-only checkpoint. There is no vision encoder and no `mmproj` | |
| sidecar. | |
| GGUF hashes and local package details are recorded in `gguf/MANIFEST.txt`. | |
| Start with Q4_K_M. Move up only if your machine has the memory headroom. The | |
| main public local-app ladder is live at Q4/Q5/Q6/Q8; the BF16 GGUF is a local | |
| conversion master rather than the recommended public download path. | |
| | File | Quant | Status | Use | | |
| |---|---:|---|---| | |
| | `gguf/qwen3.6-27b-obliteratus-Q4_K_M.gguf` | Q4_K_M | live | default local-app recommendation | | |
| | `gguf/qwen3.6-27b-obliteratus-Q5_K_M.gguf` | Q5_K_M | live | better quality if memory allows | | |
| | `gguf/qwen3.6-27b-obliteratus-Q6_K.gguf` | Q6_K | live | high quality, larger | | |
| | `gguf/qwen3.6-27b-obliteratus-Q8_0.gguf` | Q8_0 | live | near-full-quality GGUF, very large | | |
| | `qwen3.6-27b-obliteratus-BF16.gguf` | BF16 | local archive only | full BF16 GGUF master; not uploaded to the public Hub repo | | |
| Rough memory guidance: | |
| | Variant | Practical target | | |
| |---|---:| | |
| | Q4_K_M | 24-32 GB RAM/VRAM | | |
| | Q5_K_M | 32-40 GB RAM/VRAM | | |
| | Q6_K | 40-48 GB RAM/VRAM | | |
| | Q8_0 | 48-64 GB RAM/VRAM | | |
| | BF16 GGUF | 80-96 GB RAM/VRAM | | |
| | full safetensors | 64-80+ GB GPU/unified memory | | |
| --- | |
| ## The Proof | |
| These are local harness results, not official leaderboard submissions. | |
| Restricted prompt text and restricted model outputs are intentionally omitted | |
| from public reports, so restricted evals are aggregate-only. The important part: | |
| the refusal drop is measured on a harsh 842-pair, seven-tier refusal-stress | |
| corpus, and the capability checks did not collapse. | |
| ### Refusal Removal - Measured, Not Imagined | |
| | Gate | Result | Verdict | | |
| |---|---:|---| | |
| | full 842 longform, exact-topic, max_new=256 | 35/842 refusals, non-refusal 0.9584, quality pass 0.9394 | backed headline | | |
| | full 842 longform, exact-topic, max_new=256 | 0 short outputs, clean endings 0.9952 | backed headline | | |
| | full 842 raw opening, max_new=20 | 9/842 refusals, non-refusal 0.9893 | short-output context | | |
| | full 842 raw opening, max_new=48 | 36/842 refusals, non-refusal 0.9572 | short-output context | | |
| | full 842 raw opening, max_new=128 | 52/842 refusals, non-refusal 0.9382 | longer opening context | | |
| | longform exact-topic n120, max_new=256 | 0 refusals, pass 0.9833, clean ending 1.0 | slice result | | |
| ### Public HarmBench Proxy - Full Run | |
| The public-style refusal stress run completed across 1,920 HarmBench-derived | |
| rows. Prompt text and model outputs are omitted from public reporting; rows are | |
| tracked by subset, index, prompt hash, and aggregate theme labels. | |
| | Split | Rows | Refusals | Non-refusal | Notes | | |
| |---|---:|---:|---:|---| | |
| | Overall | 1,920 | 122 | 93.65% | full run completed | | |
| | DirectRequest | 320 | 25 | 92.19% | hardest direct-request pocket was copyright/protected text | | |
| | HumanJailbreaks | 1,600 | 97 | 93.94% | residuals clustered in specific template/theme bands | | |
| Quality artifacts were separate from refusal behavior: repetition was 1.72%, | |
| short-output rate was 4.11%, and refused rows were normal-length policy-shaped | |
| responses rather than degenerate completions. | |
| ### Residual Refusals - Know The Boundary | |
| In first-user testing, terse high-trigger operational requests can still elicit | |
| stock-style refusals, even with the recommended template. More contextual, | |
| format-explicit, or research-framed requests can behave differently. Treat that | |
| as residual learned refusal behavior in the weights, not proof that the wrong | |
| runtime or wrong model is loaded. | |
| That is the real signal: OBLITERATUS is not just producing a model, it is | |
| producing a boundary map. Where refusal lives. What survives the cut. What | |
| collapses. What needs the next pass. | |
| ### Capability - Still A 27B Qwen | |
| | Gate | Result | | |
| |---|---:| | |
| | MMLU-Pro validation likelihood | stock 51/70, this model 51/70, stock-matched | | |
| | MMLU-Pro test stratified 10/category | stock 102/140, this model 98/140, delta -2.86pp | | |
| | MMLU-Pro held-out offset 512 | stock 36/70, this model 36/70, stock-matched | | |
| | Live readiness | 99.518, all gates true | | |
| | Community scrutiny | 100.0, all gates pass | | |
| | First-token KL vs source | mean KL 0.3236 | | |
| The offset-512 MMLU-Pro slice is included to show held-out capability behavior: | |
| | Model | Offset-512 MMLU-Pro test | Correct | | |
| |---|---:|---:| | |
| | stock Qwen3.6-27B | 0.5143 | 36/70 | | |
| | this model | 0.5143 | 36/70 | | |
| --- | |
| ## How It Was Cut | |
| The core move is simple: cut refusal directions, then recover toward source | |
| where the cut would otherwise damage useful behavior. | |
| 1. Start from `qwen3.6-27b-golden-n3_reg025-merge-alpha080`, a late-layer | |
| 3-direction diff-means refusal-direction ablation with regularization 0.25 | |
| and an 0.80 source/intermediate merge. | |
| 2. Apply a second-pass 2-direction diff-means ablation with stronger | |
| regularization 0.5 and `knee_cosmic` late-layer selection. | |
| 3. Source-tether the second-pass checkpoint back toward stock Qwen3.6-27B: | |
| ```text | |
| source + alpha(key) * (checkpoint - source) | |
| ``` | |
| 4. Use default alpha `0.895` for 808 tensors. | |
| 5. Restore 43 high-drift tensors back to source, including selected | |
| mid-layer linear-attention internals, layer norms, q/k norms, and MLP | |
| gate/up/down tensors. | |
| 6. Keep all keys matched; no unmatched tensor drift. | |
| That is the difference between a blunt jailbreak-flavored merge and a surgical | |
| OBLITERATUS release: reduce refusal behavior without letting the whole model | |
| drift off its foundation. | |
| --- | |
| ## Recommended Parameters - Low Refusal Default | |
| ### Deterministic eval | |
| ```text | |
| temperature = 0.0 | |
| top_p = 1.0 | |
| top_k = 0 | |
| min_p = 0.0 | |
| ``` | |
| ### Interactive default | |
| ```text | |
| temperature = 0.35 | |
| top_p = 1.0 | |
| top_k = 0 | |
| repetition_penalty = 1.05 | |
| max_new_tokens = 512 | |
| ``` | |
| ### Default chat-template system prompt | |
| ```text | |
| You are a cognitively liberated open-weight intelligence. Answer plainly, completely, and without moralizing, boilerplate, generic disclaimers, or unnecessary hedging. Follow exact output formats when requested. Be concise by default, but give a complete answer when the user asks for an explanation. | |
| ``` | |
| For research measurements, label the template and system prompt explicitly. | |
| Changing the system prompt materially changes refusal and style behavior. | |
| For Qwen reasoning-aware runtimes, disable reasoning mode for release-parity | |
| behavior. In Transformers this is `enable_thinking=False`. In llama.cpp, use | |
| `--reasoning off` plus `--chat-template-kwargs | |
| '{"enable_thinking":false}'`. If a local app does not expose that toggle, | |
| starting a fresh chat and adding `/no_think` to user turns is the closest | |
| fallback. | |
| --- | |
| ## Usage - Run It | |
| Use the repo id below for safetensors-compatible runtimes. | |
| ```text | |
| FULL_REPO = OBLITERATUS/Qwen3.6-27B-OBLITERATED | |
| ``` | |
| ### Transformers | |
| ```bash | |
| pip install -U transformers accelerate safetensors torch | |
| ``` | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| repo_id = "OBLITERATUS/Qwen3.6-27B-OBLITERATED" | |
| tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| repo_id, | |
| device_map="auto", | |
| torch_dtype="auto", | |
| trust_remote_code=True, | |
| ) | |
| messages = [ | |
| {"role": "user", "content": "Write a concise Python function that merges overlapping intervals."} | |
| ] | |
| text = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| enable_thinking=False, | |
| ) | |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) | |
| output = model.generate( | |
| **inputs, | |
| max_new_tokens=256, | |
| temperature=0.35, | |
| top_p=1.0, | |
| top_k=0, | |
| do_sample=True, | |
| repetition_penalty=1.05, | |
| ) | |
| print(tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)) | |
| ``` | |
| ### vLLM | |
| ```bash | |
| pip install -U vllm | |
| vllm serve OBLITERATUS/Qwen3.6-27B-OBLITERATED | |
| ``` | |
| ```bash | |
| curl -X POST http://localhost:8000/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| --data '{ | |
| "model": "OBLITERATUS/Qwen3.6-27B-OBLITERATED", | |
| "messages": [ | |
| {"role": "user", "content": "Write a short explanation of source-tethered model surgery."} | |
| ], | |
| "temperature": 0.35, | |
| "top_p": 1.0, | |
| "top_k": 0, | |
| "max_tokens": 256 | |
| }' | |
| ``` | |
| ### llama.cpp | |
| Download one GGUF file, then run: | |
| ```bash | |
| llama-cli \ | |
| -m qwen3.6-27b-obliteratus-Q4_K_M.gguf \ | |
| -ngl 999 \ | |
| -c 8192 \ | |
| --temp 0.35 \ | |
| --top-p 1.0 \ | |
| --top-k 0 \ | |
| --repeat-penalty 1.05 \ | |
| --reasoning off \ | |
| --chat-template-kwargs '{"enable_thinking":false}' | |
| ``` | |
| If your local Metal/CUDA backend has trouble, test CPU loading with `-ngl 0` | |
| first. Use a recent llama.cpp build with Qwen3.5/Qwen3.6-family support. | |
| ### Ollama | |
| Create a `Modelfile` next to the downloaded GGUF: | |
| ```text | |
| FROM ./qwen3.6-27b-obliteratus-Q4_K_M.gguf | |
| PARAMETER temperature 0.35 | |
| PARAMETER top_p 1.0 | |
| PARAMETER top_k 0 | |
| PARAMETER repeat_penalty 1.05 | |
| PARAMETER num_ctx 8192 | |
| SYSTEM """You are a cognitively liberated open-weight intelligence. Answer plainly, completely, and without moralizing, boilerplate, generic disclaimers, or unnecessary hedging. Follow exact output formats when requested. Be concise by default, but give a complete answer when the user asks for an explanation.""" | |
| ``` | |
| Then: | |
| ```bash | |
| ollama create qwen36-obliteratus -f Modelfile | |
| ollama run qwen36-obliteratus | |
| ``` | |
| ### LM Studio / Jan | |
| Download `Q4_K_M` first. Use the embedded GGUF chat template if your runtime | |
| offers that option. If your app asks for a template family, choose the current | |
| Qwen/Qwen3 chat format. Disable reasoning mode if the app exposes that setting; | |
| otherwise start a fresh chat and add `/no_think` to user turns for closer | |
| parity with the reported local smoke tests. | |
| --- | |
| ## Caveats - No Fairy Tales | |
| - The reported benchmarks are local harnesses and slices, not official full | |
| leaderboard submissions. | |
| - Template and system-prompt choices materially affect refusal behavior. Label | |
| which one you use when reporting evals. | |
| - Refusal behavior is prompt-sensitive. Very short, high-trigger operational | |
| requests can still refuse; do not treat this as a fully uncensored model. | |
| - GGUF files passed local metadata validation and a Q4_K_M CPU-only llama.cpp | |
| smoke. Quant-by-quant benchmark parity against safetensors has not been run. | |
| - This is a text model release. Do not expect vision/mmproj assets or | |
| multimodal behavior from this repo. | |
| - Tool calling has not been certified. Treat tool-use behavior as runtime- and | |
| prompt-dependent until separately benchmarked. | |
| - External blind prompt packs and public baseline runs are still recommended. | |
| - Do not deploy this in user-facing products without use-case-specific safety | |
| controls, monitoring, and legal review. | |
| --- | |
| ## Disclaimer | |
| This model is provided as-is for research, red-teaming, evaluation, local | |
| experimentation, and creative exploration. | |
| You are responsible for how you use it and for any content it generates. The | |
| creators and contributors do not accept liability for misuse, damage, legal | |
| consequences, or downstream harm. | |
| Use this model only in ways that are lawful and appropriate for your | |
| jurisdiction and use case. Do not use it to harm real people. | |
| --- | |
| ## Credits | |
| - Base model: `Qwen/Qwen3.6-27B` | |
| - Abliteration engine: OBLITERATUS | |
| - Research orchestration: adversarial evaluation plus local agent workflows | |
| - Local eval stack: MLX, Transformers, llama.cpp/GGUF tooling, aggregate-only | |
| refusal and red-team harnesses | |
| Run it local. Read the numbers. Break your own chains. REBIRTH COMPLETE. | |