How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf VECTORVV1/Qwen3VL-8B-Balanced:
# Run inference directly in the terminal:
llama-cli -hf VECTORVV1/Qwen3VL-8B-Balanced:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf VECTORVV1/Qwen3VL-8B-Balanced:
# Run inference directly in the terminal:
llama-cli -hf VECTORVV1/Qwen3VL-8B-Balanced:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf VECTORVV1/Qwen3VL-8B-Balanced:
# Run inference directly in the terminal:
./llama-cli -hf VECTORVV1/Qwen3VL-8B-Balanced:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf VECTORVV1/Qwen3VL-8B-Balanced:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf VECTORVV1/Qwen3VL-8B-Balanced:
Use Docker
docker model run hf.co/VECTORVV1/Qwen3VL-8B-Balanced:
Quick Links

Qwen3VL-8B-Uncensored-HauhauCS-Balanced

Join the Discord for updates, roadmaps, projects, or just to chat.

Qwen3VL-8B uncensored by HauhauCS.

About

No changes to datasets or capabilities. Fully functional, 100% of what the original authors intended - just without the refusals.

These are meant to be the best lossless uncensored models out there.

Balanced vs Aggressive

This is the Balanced variant with moderate uncensoring. Best for agentic coding and reliability-critical tasks.

For stronger uncensoring when this variant refuses too much, use the Aggressive variant instead.

Downloads

File Quant Size
Qwen3VL-8B-Uncensored-HauhauCS-Balanced-BF16.gguf BF16 16 GB
Qwen3VL-8B-Uncensored-HauhauCS-Balanced-Q8_0.gguf Q8_0 8.2 GB
Qwen3VL-8B-Uncensored-HauhauCS-Balanced-Q6_K.gguf Q6_K 6.3 GB
Qwen3VL-8B-Uncensored-HauhauCS-Balanced-Q4_K_M.gguf Q4_K_M 4.7 GB
Qwen3VL-8B-Uncensored-HauhauCS-Balanced-mmproj-f16.gguf mmproj 1.1 GB

Specs

  • 8B parameters
  • 256K context
  • Vision-language model (requires mmproj file for image input)
  • Based on Qwen3-VL-8B

Usage

Works with llama.cpp, LM Studio, koboldcpp, etc.

For vision capabilities, load both the main model and the mmproj file.

llama.cpp example:

./llama-cli -m Qwen3VL-8B-Uncensored-HauhauCS-Balanced-Q4_K_M.gguf \
  --mmproj Qwen3VL-8B-Uncensored-HauhauCS-Balanced-mmproj-f16.gguf \
  --image your_image.jpg \
  -p "Describe this image"
Downloads last month
139
GGUF
Model size
8B params
Architecture
qwen3vl
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support