DiStil-Qwen3-1.7B-uncensored
Uncensored Distillation of Qwen3-1.7B β Alignment-Free Capability Transfer
Convergent Intelligence LLC: Research Division
What This Is
DiStil-Qwen3-1.7B-uncensored is a 1.7B parameter model produced by distilling Qwen3 with uncensored SFT data, removing alignment-imposed refusal behaviors while preserving the base model's reasoning and generation capabilities. The goal is a model that responds to the prompt as given rather than filtering through safety heuristics that often misfire on legitimate technical, analytical, and research queries.
This is the base model in a distillation chain:
- DiStil-Qwen3-1.7B-uncensored β you are here
- β Disctil-Qwen3-1.7B (DISC-informed refinement)
Architecture
| Parameter | Value |
|---|---|
| Architecture | Qwen3ForCausalLM |
| Parameters | ~2.03B (1.7B effective) |
| Hidden Size | 2048 |
| Layers | 28 |
| Attention Heads | 16 (Q) / 8 (KV) β GQA |
| Intermediate | 6144 |
| Context Length | 40,960 tokens |
| Vocabulary | 151,936 |
Training
Supervised fine-tuning using TRL on uncensored instruction data. The training preserves the base Qwen3 architecture and tokenizer while shifting the model's response distribution away from refusal patterns. No architectural modifications β this is a pure SFT intervention on the response surface.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"reaperdoesntknow/DiStil-Qwen3-1.7B-uncensored",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/DiStil-Qwen3-1.7B-uncensored")
messages = [{"role": "user", "content": "Explain the tradeoffs between alignment training and capability preservation in small language models."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Mathematical Foundations: Discrepancy Calculus (DISC)
This model is part of a distillation chain built on Discrepancy Calculus β a measure-theoretic framework where the teacher's output distribution is decomposed via the Mesh Fundamental Identity into smooth (AC), jump, and Cantor components. The discrepancy operator $Df(x) = \lim_{\varepsilon \downarrow 0} \frac{1}{\varepsilon} \int_x^{x+\varepsilon} \frac{|f(t) - f(x)|}{|t - x|} dt$ quantifies local structural mismatch that standard KL divergence averages away.
Full theory: "On the Formal Analysis of Discrepancy Calculus" (Colca, 2026; Convergent Intelligence LLC: Research Division). Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165).
Related Models
| Model | Description | Downloads |
|---|---|---|
| Disctil-Qwen3-1.7B | DISC-informed refinement of this model | 286 |
| DistilQwen3-1.7B-uncensored | Parallel distillation variant | 351 |
| DistilQwen3-1.7B-uncensored-GGUF | Quantized for edge deployment | 239 |
| TopologicalQwen | Topology-aware distillation (TKD) | 622 |
DistilQwen Collection β Full proof-weighted distillation series
Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165)
Citation
@misc{colca2026distiluncensored,
title={DiStil-Qwen3-1.7B-uncensored: Alignment-Free Capability Transfer},
author={Colca, Roy S.},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/reaperdoesntknow/DiStil-Qwen3-1.7B-uncensored},
note={Convergent Intelligence LLC: Research Division}
}
From the Convergent Intelligence Portfolio
DistilQwen Collection β Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B β 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models, 2,788 combined downloads. The rest of the portfolio proves structure beats scale on CPU. This collection shows what happens when you give the methodology real hardware.
Top model: Qwen3-1.7B-Coder-Distilled-SFT β 508 downloads
Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165)
Convergent Intelligence LLC: Research Division
Convergent Intelligence LLC: Research Division "Where classical analysis fails to see, we begin."
Part of the reaperdoesntknow research portfolio β 49 models, 22,598 total downloads | Last refreshed: 2026-03-30 12:05 UTC
Last updated: 2026-03-31 by Convergent Intelligence LLC: Research Division
- Downloads last month
- 1,856
Model tree for reaperdoesntknow/DiStil-Qwen3-1.7B-uncensored
Base model
Qwen/Qwen3-1.7B-Base