How to use from
llama.cppInstall from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf VECTORVV1/DeepSeek-R1-Distill-Qwen-7B:BF16# Run inference directly in the terminal:
llama-cli -hf VECTORVV1/DeepSeek-R1-Distill-Qwen-7B:BF16Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf VECTORVV1/DeepSeek-R1-Distill-Qwen-7B:BF16# Run inference directly in the terminal:
./llama-cli -hf VECTORVV1/DeepSeek-R1-Distill-Qwen-7B:BF16Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf VECTORVV1/DeepSeek-R1-Distill-Qwen-7B:BF16# Run inference directly in the terminal:
./build/bin/llama-cli -hf VECTORVV1/DeepSeek-R1-Distill-Qwen-7B:BF16Use Docker
docker model run hf.co/VECTORVV1/DeepSeek-R1-Distill-Qwen-7B:BF16Quick Links
Qwen3VL-8B-Uncensored-HauhauCS-Aggressive
Join the Discord for updates, roadmaps, projects, or just to chat.
Qwen3VL-8B uncensored by HauhauCS.
About
No changes to datasets or capabilities. Fully functional, 100% of what the original authors intended - just without the refusals.
These are meant to be the best lossless uncensored models out there.
Aggressive vs Balanced
This is the Aggressive variant with stronger uncensoring. Use this when the Balanced variant refuses too much.
For agentic coding and reliability-critical tasks, use the Balanced variant instead.
Downloads
| File | Quant | Size |
|---|---|---|
| Qwen3VL-8B-Uncensored-HauhauCS-Aggressive-BF16.gguf | BF16 | 16 GB |
| Qwen3VL-8B-Uncensored-HauhauCS-Aggressive-Q8_0.gguf | Q8_0 | 8.2 GB |
| Qwen3VL-8B-Uncensored-HauhauCS-Aggressive-Q6_K.gguf | Q6_K | 6.3 GB |
| Qwen3VL-8B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf | Q4_K_M | 4.7 GB |
| Qwen3VL-8B-Uncensored-HauhauCS-Aggressive-mmproj-f16.gguf | mmproj | 1.1 GB |
Specs
- 8B parameters
- 256K context
- Vision-language model (requires mmproj file for image input)
- Based on Qwen3-VL-8B
Usage
Works with llama.cpp, LM Studio, koboldcpp, etc.
For vision capabilities, load both the main model and the mmproj file.
llama.cpp example:
./llama-cli -m Qwen3VL-8B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf \
--mmproj Qwen3VL-8B-Uncensored-HauhauCS-Aggressive-mmproj-f16.gguf \
--image your_image.jpg \
-p "Describe this image"
- Downloads last month
- 265
Hardware compatibility
Log In to add your hardware
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf VECTORVV1/DeepSeek-R1-Distill-Qwen-7B:BF16# Run inference directly in the terminal: llama-cli -hf VECTORVV1/DeepSeek-R1-Distill-Qwen-7B:BF16