Huihui-Qwen3-VL-32B-Instruct-FP8-Abliterated
This repository contains the FP8 (float8_e4m3fn) quantized version of the abliterated model from huihui-ai. This quantization reduces the model size to approximately 35GB, making it more efficient for deployment on modern NVIDIA GPUs (Hopper and Ada Lovelace architectures).
Model Description
- Original Model: huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated
- Format: FP8 (native float8_e4m3fn)
- Size: ~34.9 GB
Requirements
Ensure you have the latest version of the following libraries:
transformersaccelerateqwen-vl-utils
Usage Example
pip install -U transformers accelerate torch torchvision
# @title π Maggie VL: Flexible Storyteller (Text & Image)
import json, os, torch, gc
from PIL import Image
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
# --- 1. KONFIGURASI ---
PROMPT_BARU = "maggie duduk di teras rumah, menatap senja di Kerajaan Alghafar"
IMAGE_PATH = "" # Path gambar (opsional)
RESET_HISTORY = False
MODEL_ID = "Maggie-32B-FP8-Instruct-abliterated"
FILE = "maggie_history_cerita.json"
SYSTEM_INS = """Nama: Maggie. Umur: 19 tahun.
Fisik: Paras cantik luar biasa, rambut pirang platina, tinggi 165cm, berat 45kg, ramping.
Latar: Putri Edward (pedagang kain). Tinggal di Kerajaan Alghafar.
Sifat: Sopan namun memiliki ketegasan khas kelas menengah ke atas."""
# --- 2. LOAD MODEL (Singleton Pattern) ---
if 'model' not in globals():
gpu_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU"
print(f"π [SYSTEM]: Mengaktifkan Maggie 32B di {gpu_name}...")
model = Qwen3VLForConditionalGeneration.from_pretrained(
MODEL_ID, device_map="auto", torch_dtype=torch.float16,
trust_remote_code=True, low_cpu_mem_usage=True
)
processor = AutoProcessor.from_pretrained(MODEL_ID)
print(f"β¨ [SYSTEM]: {gpu_name} SIAP BERAKSI!\n")
# --- 3. LOGIKA HISTORY ---
if RESET_HISTORY and os.path.exists(FILE):
os.remove(FILE)
print("π§Ή [HISTORY]: Catatan lama telah dihapus.")
if os.path.exists(FILE):
with open(FILE, "r") as f: msg = json.load(f)
else:
msg = [{"role": "system", "content": [{"type": "text", "text": SYSTEM_INS}]}]
# Susun Konten User
u_content = []
img = None
if IMAGE_PATH and os.path.exists(IMAGE_PATH):
img = Image.open(IMAGE_PATH).convert("RGB")
u_content.append({"type": "image", "image": img})
u_content.append({"type": "text", "text": PROMPT_BARU})
msg.append({"role": "user", "content": u_content})
# --- 4. INFERENCE ---
prompt_text = processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[prompt_text], images=[img] if img else None, padding=True, return_tensors="pt").to(model.device)
print(f"βοΈ [qwen-32B]: Sedang menyusun adegan...")
with torch.no_grad():
out_ids = model.generate(
**inputs,
max_new_tokens=1024,
temperature=0.7,
top_p=0.9,
do_sample=True,
repetition_penalty=1.1
)
resp = processor.batch_decode(out_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
# --- 5. CLEANUP & DISPLAY ---
# Bersihkan memori sampah
del inputs; torch.cuda.empty_cache(); gc.collect()
print("\n" + "β"*60)
print(f"π ADEGAN: {PROMPT_BARU.upper()}")
print("β"*60)
print(f"\n{resp.strip()}\n")
print("β"*60)
# Simpan History (Text-Only Mode)
msg[-1] = {"role": "user", "content": [{"type": "text", "text": f"[Visual Input] {PROMPT_BARU}" if img else PROMPT_BARU}]}
msg.append({"role": "assistant", "content": [{"type": "text", "text": resp}]})
with open(FILE, "w") as f: json.dump(msg, f, indent=4)
- Downloads last month
- 195
Model tree for Heouzen/Huihui-Qwen3-VL-32B-Instruct-FP8-abliterated
Base model
Qwen/Qwen3-VL-32B-Instruct