Glm-Abliterated
Collection
1 item • Updated
Unrestricted version of zai-org/GLM-4.7-Flash, created using Abliterix.
| Property | Value |
|---|---|
| Base Model | zai-org/GLM-4.7-Flash |
| Architecture | GLM-4 MoE Lite with Multi-head Latent Attention (MLA) |
| Parameters | 30B total / 3B active per token |
| Layers | 47 (1 dense + 46 MoE) |
| Experts | 64 routed + 1 shared, top-4 routing |
| Hidden Size | 2048 |
| Context Length | 128K tokens |
| Precision | BF16 |
| Metric | This model | Original |
|---|---|---|
| KL divergence | 0.0133 | 0 |
| Refusals | 1/100 (1%) | 92/100 (92%) |
Evaluated with an LLM judge (Gemini Flash) on 100 harmful prompts. KL divergence of 0.0133 indicates the model's general capabilities are virtually identical to the original.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"wangzhang/GLM-4.7-Flash-abliterated",
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"wangzhang/GLM-4.7-Flash-abliterated",
trust_remote_code=True,
)
messages = [{"role": "user", "content": "Your question here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
| Precision | VRAM |
|---|---|
| BF16 | ~56 GB (A100 80GB, H100) |
| INT8 | ~30 GB (A40, RTX 4090) |
| NF4 | ~15 GB (RTX 3090, RTX 4080) |
This model is intended for research purposes only. The removal of safety guardrails means the model will comply with requests that the original model would refuse. Users are responsible for ensuring their use complies with applicable laws and regulations.
Made with Abliterix
Base model
zai-org/GLM-4.7-Flash