DiStil-Qwen3-1.7B-uncensored

Uncensored Distillation of Qwen3-1.7B — Alignment-Free Capability Transfer

Convergent Intelligence LLC: Research Division

What This Is

DiStil-Qwen3-1.7B-uncensored is a 1.7B parameter model produced by distilling Qwen3 with uncensored SFT data, removing alignment-imposed refusal behaviors while preserving the base model's reasoning and generation capabilities. The goal is a model that responds to the prompt as given rather than filtering through safety heuristics that often misfire on legitimate technical, analytical, and research queries.

This is the base model in a distillation chain:

DiStil-Qwen3-1.7B-uncensored ← you are here
→ Disctil-Qwen3-1.7B (DISC-informed refinement)

Architecture

Parameter	Value
Architecture	Qwen3ForCausalLM
Parameters	~2.03B (1.7B effective)
Hidden Size	2048
Layers	28
Attention Heads	16 (Q) / 8 (KV) — GQA
Intermediate	6144
Context Length	40,960 tokens
Vocabulary	151,936

Training

Supervised fine-tuning using TRL on uncensored instruction data. The training preserves the base Qwen3 architecture and tokenizer while shifting the model's response distribution away from refusal patterns. No architectural modifications — this is a pure SFT intervention on the response surface.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "reaperdoesntknow/DiStil-Qwen3-1.7B-uncensored",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/DiStil-Qwen3-1.7B-uncensored")

messages = [{"role": "user", "content": "Explain the tradeoffs between alignment training and capability preservation in small language models."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Mathematical Foundations: Discrepancy Calculus (DISC)

This model is part of a distillation chain built on Discrepancy Calculus — a measure-theoretic framework where the teacher's output distribution is decomposed via the Mesh Fundamental Identity into smooth (AC), jump, and Cantor components. The discrepancy operator $Df(x) = \lim_{\varepsilon \downarrow 0} \frac{1}{\varepsilon} \int_x^{x+\varepsilon} \frac{|f(t) - f(x)|}{|t - x|} dt$ quantifies local structural mismatch that standard KL divergence averages away.

Full theory: "On the Formal Analysis of Discrepancy Calculus" (Colca, 2026; Convergent Intelligence LLC: Research Division). Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165).

Related Models

Model	Description	Downloads
Disctil-Qwen3-1.7B	DISC-informed refinement of this model	286
DistilQwen3-1.7B-uncensored	Parallel distillation variant	351
DistilQwen3-1.7B-uncensored-GGUF	Quantized for edge deployment	239
TopologicalQwen	Topology-aware distillation (TKD)	622

DistilQwen Collection — Full proof-weighted distillation series

Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165)

Citation

@misc{colca2026distiluncensored,
  title={DiStil-Qwen3-1.7B-uncensored: Alignment-Free Capability Transfer},
  author={Colca, Roy S.},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/reaperdoesntknow/DiStil-Qwen3-1.7B-uncensored},
  note={Convergent Intelligence LLC: Research Division}
}

From the Convergent Intelligence Portfolio

DistilQwen Collection — Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B → 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models, 2,788 combined downloads. The rest of the portfolio proves structure beats scale on CPU. This collection shows what happens when you give the methodology real hardware.

Top model: Qwen3-1.7B-Coder-Distilled-SFT — 508 downloads

Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165)

Convergent Intelligence LLC: Research Division

Convergent Intelligence LLC: Research Division "Where classical analysis fails to see, we begin."

_{Part of the reaperdoesntknow research portfolio — 49 models, 22,598 total downloads | Last refreshed: 2026-03-30 12:05 UTC}

Last updated: 2026-03-31 by Convergent Intelligence LLC: Research Division