Huihui-Qwen3-VL-32B-Instruct-FP8-Abliterated

This repository contains the FP8 (float8_e4m3fn) quantized version of the abliterated model from huihui-ai. This quantization reduces the model size to approximately 35GB, making it more efficient for deployment on modern NVIDIA GPUs (Hopper and Ada Lovelace architectures).

Model Description

Requirements

Ensure you have the latest version of the following libraries:

  • transformers
  • accelerate
  • qwen-vl-utils

Usage Example

pip install -U transformers accelerate torch torchvision
# @title 🎭 Maggie VL: Flexible Storyteller (Text & Image)
import json, os, torch, gc
from PIL import Image
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor

# --- 1. KONFIGURASI ---
PROMPT_BARU = "maggie duduk di teras rumah, menatap senja di Kerajaan Alghafar" 
IMAGE_PATH = ""  # Path gambar (opsional)
RESET_HISTORY = False

MODEL_ID = "Maggie-32B-FP8-Instruct-abliterated"
FILE = "maggie_history_cerita.json"

SYSTEM_INS = """Nama: Maggie. Umur: 19 tahun.
Fisik: Paras cantik luar biasa, rambut pirang platina, tinggi 165cm, berat 45kg, ramping.
Latar: Putri Edward (pedagang kain). Tinggal di Kerajaan Alghafar.
Sifat: Sopan namun memiliki ketegasan khas kelas menengah ke atas."""

# --- 2. LOAD MODEL (Singleton Pattern) ---
if 'model' not in globals():
    gpu_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU"
    print(f"πŸš€ [SYSTEM]: Mengaktifkan Maggie 32B di {gpu_name}...")
    model = Qwen3VLForConditionalGeneration.from_pretrained(
        MODEL_ID, device_map="auto", torch_dtype=torch.float16, 
        trust_remote_code=True, low_cpu_mem_usage=True
    )
    processor = AutoProcessor.from_pretrained(MODEL_ID)
    print(f"✨ [SYSTEM]: {gpu_name} SIAP BERAKSI!\n")

# --- 3. LOGIKA HISTORY ---
if RESET_HISTORY and os.path.exists(FILE): 
    os.remove(FILE)
    print("🧹 [HISTORY]: Catatan lama telah dihapus.")

if os.path.exists(FILE):
    with open(FILE, "r") as f: msg = json.load(f)
else:
    msg = [{"role": "system", "content": [{"type": "text", "text": SYSTEM_INS}]}]

# Susun Konten User
u_content = []
img = None
if IMAGE_PATH and os.path.exists(IMAGE_PATH):
    img = Image.open(IMAGE_PATH).convert("RGB")
    u_content.append({"type": "image", "image": img})
u_content.append({"type": "text", "text": PROMPT_BARU})
msg.append({"role": "user", "content": u_content})

# --- 4. INFERENCE ---
prompt_text = processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[prompt_text], images=[img] if img else None, padding=True, return_tensors="pt").to(model.device)

print(f"✍️  [qwen-32B]: Sedang menyusun adegan...")
with torch.no_grad():
    out_ids = model.generate(
        **inputs, 
        max_new_tokens=1024, 
        temperature=0.7, 
        top_p=0.9, 
        do_sample=True, 
        repetition_penalty=1.1
    )
    resp = processor.batch_decode(out_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]

# --- 5. CLEANUP & DISPLAY ---
# Bersihkan memori sampah
del inputs; torch.cuda.empty_cache(); gc.collect()

print("\n" + "━"*60)
print(f"πŸ“– ADEGAN: {PROMPT_BARU.upper()}")
print("━"*60)
print(f"\n{resp.strip()}\n")
print("━"*60)

# Simpan History (Text-Only Mode)
msg[-1] = {"role": "user", "content": [{"type": "text", "text": f"[Visual Input] {PROMPT_BARU}" if img else PROMPT_BARU}]}
msg.append({"role": "assistant", "content": [{"type": "text", "text": resp}]})
with open(FILE, "w") as f: json.dump(msg, f, indent=4)
Downloads last month
195
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Heouzen/Huihui-Qwen3-VL-32B-Instruct-FP8-abliterated