Qwopus3.5-27B-v3-TQ3_4S
TQ3_4S is a 3.5-bit Walsh-Hadamard-transform weight format with four per-8 scales per 32-weight block.
This release is a TQ3_4S GGUF quantization of Jackrong/Qwopus3.5-27B-v3, which is itself derived from the Qwen3.5-27B family.
Quantization Source
- HF source checkout:
Jackrong/Qwopus3.5-27B-v3
- upstream family:
Qwen/Qwen3.5-27B
- F16 GGUF used as the quantization source:
Qwopus3.5-27B-v3-f16.gguf
Quantized with:
./build/bin/llama-quantize \
/path/to/Qwopus3.5-27B-v3-f16.gguf \
/path/to/Qwopus3.5-27B-v3-TQ3_4S.gguf \
TQ3_4S \
8
Quality
Full-pass wiki.test.raw, c=2048:
Final PPL = 6.3433 +/- 0.03999Median chunk PPL = 6.1953
Runtime Validation
Validated on clean public llama.cpp-tq3 main:
- runtime commit:
62eb27dce - runtime requirement:
turbo-tan/llama.cpp-tq3
- strict chat smoke:
- prompt:
Write ONLY the word ok. - response:
ok
- prompt:
- multimodal projector:
mmproj.gguf
Validated server profile:
./build/bin/llama-server \
-m /path/to/Qwopus3.5-27B-v3-TQ3_4S.gguf \
-mm /path/to/mmproj.gguf \
-a qwopus35-27b-v3-tq3_4s \
--host 127.0.0.1 --port 8080 \
-ngl 99 -c 8192 -np 1 \
-ctk q8_0 -ctv q8_0 -fa on \
--no-warmup --jinja \
--reasoning off --reasoning-budget 0 --reasoning-format deepseek \
--cache-ram 0 --no-mmproj-offload
Recommended Chat Settings
For cleaner short-answer behavior on this reasoning-distilled model, the best local setting I found was:
--reasoning on --reasoning-budget 0 --temp 0.6 --top-k 20 --min-p 0 --repeat-penalty 1.0
This helps suppress visible thinking-tag spill better than --reasoning off on simple prompts.
Vision / Image Input
The repo includes mmproj.gguf for multimodal use.
If your frontend says image input is unsupported, it is usually talking to an older server process that was started without --mmproj.
Notes
- This is a weight quantization release for the Qwopus v3 model line.
- Running this GGUF requires the
TQ3_4Sruntime in:turbo-tan/llama.cpp-tq3
Credits
- Downloads last month
- 2,618
We're not able to determine the quantization variants.