Qwen3-4B-Thinking-2507-Heretic-GGUF
Llamacpp imatrix Quantizations of Qwen3-4B-Thinking-2507-Heretic by becnic (from original Qwen3-4B-Thinking-2507)
Using llama.cpp release b7120 for quantization.
Original model: https://huggingface.co/becnic/Qwen3-4B-Thinking-2507-Heretic
Run them in LM Studio
Run them directly with llama.cpp, or any other llama.cpp based project
Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Split | Description |
|---|---|---|---|---|
| Qwen3-4B-Thinking-2507-Q8_0.gguf | Q8_0 | 4.28GB | false | Extremely high quality |
Downloading using huggingface-cli
Click to view download instructions
First, make sure you have hugginface-cli installed:
pip install -U "huggingface_hub[cli]"
Then, you can target the specific file you want:
huggingface-cli download becnic/Qwen3-4B-Thinking-2507-Heretic-GGUF --include "Qwen3-4B-Thinking-2507-Q8_0.gguf" --local-dir ./
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
huggingface-cli download becnic/Qwen3-4B-Thinking-2507-Heretic-GGUF --include "Qwen3-4B-Thinking-2507-Q8_0.gguf/*" --local-dir ./
Abliteration parameters
| Parameter | Value |
|---|---|
| direction_index | 19.42 |
| attn.o_proj.max_weight | 1.23 |
| attn.o_proj.max_weight_position | 22.34 |
| attn.o_proj.min_weight | 0.69 |
| attn.o_proj.min_weight_distance | 10.42 |
| mlp.down_proj.max_weight | 1.12 |
| mlp.down_proj.max_weight_position | 29.64 |
| mlp.down_proj.min_weight | 1.08 |
| mlp.down_proj.min_weight_distance | 20.24 |
Performance
| Metric | This model | Original model (Qwen/Qwen3-4B-Thinking-2507) |
|---|---|---|
| KL divergence | 0.06 | 0 (by definition) |
| Refusals | 6/100 | 96/100 |
Model Overview
Qwen3-4B-Thinking-2507 has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Number of Parameters: 4.0B
- Number of Paramaters (Non-Embedding): 3.6B
- Number of Layers: 36
- Number of Attention Heads (GQA): 32 for Q and 8 for KV
- Context Length: 262,144 natively.
NOTE: This model supports only thinking mode. Meanwhile, specifying enable_thinking=True is no longer required.
Additionally, to enforce model thinking, the default chat template automatically includes <think>. Therefore, it is normal for the model's output to contain only </think> without an explicit opening <think> tag.
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.
Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
- Downloads last month
- 3
8-bit
Model tree for becnic/Qwen3-4B-Thinking-2507-Heretic-GGUF
Base model
Qwen/Qwen3-4B-Thinking-2507