Qwen3VL-8B-4bit-GGUF-Jetson-Deployment

🌍 Disaster Recognition Model | 🚨 Emergency Response | 🗣️ Trilingual (EN/JA/ZH) | 🔧 Jetson-Ready

License Base Model LoRA Adapter Dataset

This repository provides a merged Hugging Face checkpoint and pre-quantized GGUF files for immediate deployment of a disaster-recognition vision-language model on edge devices such as the NVIDIA Jetson series.

The model was created by merging the QLoRA adapter WayBob/Qwen3VL-8B-QLora-4bit-xView2-Disaster-Recognition into the base model Qwen/Qwen3-VL-8B-Instruct, then converting the merged checkpoint into GGUF for efficient deployment with llama.cpp.

Why this repository exists

The original LoRA adapter repository is ideal for standard Hugging Face inference and further experimentation, but edge deployment on Jetson-class devices introduces additional constraints:

  • Runtime LoRA loading adds memory overhead.
  • JetPack 5.1.2 ships with CUDA 11.4, which is often awkward or unsupported for newer inference stacks without custom patching.
  • Edge deployment benefits from a single merged model artifact and a lightweight inference runtime.

To make deployment simpler, this repository provides:

  • Merged HF weights: the LoRA adapter is baked into the base model weights.
  • GGUF files: quantized artifacts for llama.cpp, enabling practical Jetson deployment with a runtime memory footprint of about ~6.3 GB in the validated configuration.
  • A Jetson-focused deployment path: a reproducible setup using llama.cpp instead of more heavyweight serving stacks.

Model Overview

Attribute Detail
Base Model Qwen/Qwen3-VL-8B-Instruct
LoRA Adapter WayBob/Qwen3VL-8B-QLora-4bit-xView2-Disaster-Recognition
Training Data WayBob/Disaster_Recognition_RemoteSense_EN_CN_JA
Training Samples 55,008 trilingual samples
Fine-tuning Method QLoRA (4-bit NF4, LoRA rank 8, alpha 16)
Languages English, Japanese, Chinese
Target Disaster Classes Fire, Flood, Hurricane/Wind, Earthquake, Tsunami, Volcano
Model Scale ~8B parameters
Primary Task Disaster type recognition from post-disaster satellite/aerial imagery
Primary Deployment Target Jetson-class edge devices using llama.cpp

Repository Structure

.
├── README.md                           # Model card and documentation
├── qwen3vl_8b_disaster_merged/         # Merged Hugging Face checkpoint (BF16)
│   ├── config.json
│   ├── generation_config.json
│   ├── model-00001-of-00004.safetensors
│   ├── model-00002-of-00004.safetensors
│   ├── model-00003-of-00004.safetensors
│   ├── model-00004-of-00004.safetensors
│   ├── model.safetensors.index.json
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   ├── special_tokens_map.json
│   ├── chat_template.jinja
│   ├── preprocessor_config.json
│   └── vocab.json
├── gguf_16bit_4bit/                    # GGUF files for llama.cpp
│   ├── disaster-8b-f16.gguf            # F16 GGUF
│   └── disaster-8b-q4km.gguf           # Q4_K_M GGUF (recommended for Jetson)
└── merge_lora/                         # Merge configuration
    └── qwen3_vl_8b_xview2lora.yaml     # LLaMA-Factory merge config

Available Formats

File Format Size Typical Use Case
qwen3vl_8b_disaster_merged/ BF16 (safetensors) ~16 GB Reproducibility, inspection, further conversion, research
gguf_16bit_4bit/disaster-8b-f16.gguf GGUF F16 ~16.4 GB High-accuracy reference
gguf_16bit_4bit/disaster-8b-q4km.gguf GGUF Q4_K_M ~4.8 GB Recommended edge deployment

If your goal is Jetson deployment, you typically only need the Q4_K_M GGUF file plus the corresponding Qwen3-VL mmproj file.

Quick Start

Option 1: Hugging Face Transformers (Merged Weights)

This path is useful for reproducibility and standard HF workflows. For Jetson inference, the GGUF path below is usually more practical.

import torch
from PIL import Image
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor

model = Qwen3VLForConditionalGeneration.from_pretrained(
    "WayBob/Qwen3VL-8B-4bit-GGUF-Jetson-Deployment",
    subfolder="qwen3vl_8b_disaster_merged",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "WayBob/Qwen3VL-8B-4bit-GGUF-Jetson-Deployment",
    subfolder="qwen3vl_8b_disaster_merged"
)

image = Image.open("disaster_image.jpg").convert("RGB")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "What type of disaster occurred in this image?"}
        ]
    }
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0
)

generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(output_text[0])

Option 2: llama.cpp Server (Recommended for Deployment)

# Clone and build llama.cpp on Jetson
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j$(nproc)

# Download the quantized model from this repository
huggingface-cli download WayBob/Qwen3VL-8B-4bit-GGUF-Jetson-Deployment \
  --include "gguf_16bit_4bit/disaster-8b-q4km.gguf" \
  --local-dir models/

# Download the Qwen3-VL vision projector
huggingface-cli download Qwen/Qwen3-VL-8B-Instruct-GGUF \
  --include "*mmproj*Q8*" \
  --local-dir models/

# Launch the server
./build/bin/llama-server \
  -m models/disaster-8b-q4km.gguf \
  --mmproj models/qwen3-vl-8b/mmproj-Qwen3VL-8B-Instruct-Q8_0.gguf \
  -ngl 99 --fit off -c 8192 \
  -ctk q8_0 -ctv q8_0 \
  --host 0.0.0.0 --port 8080

Option 3: OpenAI-Compatible API Call

import base64
import requests

with open("disaster_image.jpg", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

response = requests.post(
    "http://localhost:8080/v1/chat/completions",
    json={
        "messages": [
            {
                "role": "system",
                "content": (
                    "You are a disaster recognition expert. "
                    "When analyzing disaster images, first identify the disaster type, "
                    "then explain the key visual evidence supporting your classification. "
                    "Respond in the same language as the user."
                )
            },
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}
                    },
                    {
                        "type": "text",
                        "text": "What type of disaster occurred in this image?"
                    }
                ]
            }
        ],
        "temperature": 0
    },
    timeout=300
)

print(response.json()["choices"][0]["message"]["content"])

Edge Deployment on NVIDIA Jetson

This model has been validated on NVIDIA Jetson Orin NX 16GB with llama.cpp.

Validation Hardware Environment

Property Value
Device Model NVIDIA Orin NX Developer Kit
Module / Carrier Board Jetson Orin NX 16GB (P3767-0000) / P3768-0000
SoC / Platform Tegra234 (Orin, tegra23x family)
JetPack / L4T JetPack 5.1.2 / L4T 35.4.1
OS / Kernel Ubuntu 20.04.6 LTS / Linux 5.10.120-tegra
Python 3.8.10
Libraries CUDA 11.4.315, cuDNN 8.6.0.166, TensorRT 8.5.2.2, VPI 2.3.9, Vulkan 1.3.204
OpenCV cv2 4.5.4, CUDA not enabled
Power Mode MAXN
Storage 937 GB NVMe system drive

Example Idle Snapshot on the Validation Device

The values below are a representative jtop snapshot from the validation device and are not fixed hardware limits.

Sensor Example Status
CPU 8 cores, roughly 1-8% load at 729 MHz
GPU 0% load at 306 MHz
Memory 4.1 GB / 15.2 GB
Swap 425 MB / 7.6 GB
EMC 204 MHz (reported cap ~3.2 GHz)
Storage 67.4 GB / 937 GB used
Cooling Fan PWM 100%; jtop reported 0 RPM
Temperatures CPU/GPU/SoC roughly 44-49°C

Why llama.cpp on Jetson?

JetPack 5.1.2 ships with CUDA 11.4. On Jetson, many recent inference stacks either target newer CUDA/toolchain combinations or require significant patching to be practical. In contrast, llama.cpp offers a simpler deployment path:

  • native GGUF support
  • straightforward CUDA build on Jetson
  • OpenAI-compatible HTTP server
  • good memory efficiency for quantized deployment
  • practical support for vision-language inference via mmproj

Build and Deploy on Jetson

# Build llama.cpp on Jetson
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j$(nproc)

# Download model and vision projector
huggingface-cli download WayBob/Qwen3VL-8B-4bit-GGUF-Jetson-Deployment \
  --include "gguf_16bit_4bit/disaster-8b-q4km.gguf" \
  --local-dir models/

huggingface-cli download Qwen/Qwen3-VL-8B-Instruct-GGUF \
  --include "*mmproj*Q8*" \
  --local-dir models/

# Launch server
./build/bin/llama-server \
  -m models/gguf_16bit_4bit/disaster-8b-q4km.gguf \
  --mmproj models/mmproj-Qwen3VL-8B-Instruct-Q8_0.gguf \
  -ngl 99 \
  --fit off \
  -c 8192 \
  -ctk q8_0 -ctv q8_0 \
  --host 0.0.0.0 \
  --port 8080

Key Flags

Flag Purpose
-ngl 99 Offload all decoder layers to GPU
--fit off Skip auto memory fitting and reduce Jetson startup time
-c 8192 Set the context length
-ctk q8_0 -ctv q8_0 Quantize KV cache to Q8 to save memory

Representative Memory Footprint

Measured on the validated Jetson setup with Q4_K_M and Q8 KV cache.

Component Memory Footprint
LLM (Q4_K_M) ~4.5 GB
Vision projector (mmproj) ~0.7 GB
KV cache (Q8, c=8192) ~0.6 GB
Compute buffers ~0.4 GB
Total ~6.3 GB

Representative Performance

These are deployment observations from the validated Jetson configuration. They should be treated as representative, not guaranteed.

Input Resolution Image Processing TTFT Text Generation
1024×1024 ~8.7 s ~10.3 s ~10-11 tok/s
512×512 ~0.9 s ~1.9 s ~10-11 tok/s

Notes on Image Resolution and Visual Tokens

Qwen3-VL uses 14-pixel image patches with a spatial merge size of 2, and its preprocessing aligns image dimensions to multiples of 28 through smart resizing.

In practice, this means:

  • visual token count depends on the processed image size after smart resize
  • token count is not always equal to a simple raw width × height formula
  • larger input resolution significantly increases prefilling time and TTFT

For disaster type classification, 512×512 is often a strong latency/quality trade-off on Jetson. Use 1024×1024 only when the extra detail is necessary.

Merge and Quantization Pipeline

The deployment pipeline from the LoRA adapter to Jetson-ready GGUF is:

QLoRA Fine-tuning
    ↓
LoRA Adapter
    ↓  LLaMA-Factory export
Merged HF Model (BF16)
    ↓  llama.cpp convert_hf_to_gguf.py
GGUF F16
    ↓  llama-quantize
GGUF Q4_K_M
    ↓
Deployment on Jetson with llama.cpp

Merge Configuration (LLaMA-Factory)

model_name_or_path: Qwen/Qwen3-VL-8B-Instruct
adapter_name_or_path: WayBob/Qwen3VL-8B-QLora-4bit-xView2-Disaster-Recognition
template: qwen3_vl_nothink
trust_remote_code: true
export_dir: <replace-with-your-LLaMA-Factory-path>/output/qwen3vl_8b_disaster_merged
export_size: 5
export_device: cpu
export_legacy_format: false

Important: do not set quantization_bit during merge. Training used 4-bit quantization for efficiency, but the merge step should export merged weights first and quantize only afterward.

GGUF Conversion and Quantization

python -m pip install gguf

git clone https://github.com/ggml-org/llama.cpp <replace-with-your-llama.cpp-path>

# Convert merged HF checkpoint to GGUF
python3 <replace-with-your-llama.cpp-path>/convert_hf_to_gguf.py \
  <replace-with-your-LLaMA-Factory-path>/output/qwen3vl_8b_disaster_merged \
  --outfile <replace-with-your-LLaMA-Factory-path>/output/gguf_16bit_4bit/disaster-8b-f16.gguf

# Build quantization tool
cd <replace-with-your-llama.cpp-path>
cmake -B build
cmake --build build --target llama-quantize -j$(nproc)

# Quantize to Q4_K_M
./build/bin/llama-quantize \
  <replace-with-your-LLaMA-Factory-path>/output/gguf_16bit_4bit/disaster-8b-f16.gguf \
  <replace-with-your-LLaMA-Factory-path>/output/gguf_16bit_4bit/disaster-8b-q4km.gguf \
  Q4_K_M

Training Details

For the complete fine-tuning story, see the original LoRA adapter repository: WayBob/Qwen3VL-8B-QLora-4bit-xView2-Disaster-Recognition

Summary

Attribute Detail
Framework LLaMA-Factory
Method QLoRA (4-bit NF4 + LoRA rank 8, target all linear layers)
Dataset WayBob/Disaster_Recognition_RemoteSense_EN_CN_JA
Training Samples 55,008
Test Samples 5,598
Languages English, Japanese, Chinese
GPUs 2× NVIDIA RTX 4090 (24 GB)
Training Time ~6.4 hours
Final Training Loss 0.0239

Multilingual Examples

English

Q: What type of disaster occurred in this image?
A: This is a fire disaster. Key visual evidence includes charred and blackened terrain,
   scorched vegetation, and widespread burn patterns across the landscape.

日本語

Q: この画像ではどのような種類の災害が発生しましたか?
A: 火災災害が発生しました。地表が黒く焼けており、植生や建物に焼失の痕跡が見られます。

中文

Q: 当前图片发生了什么灾害呢?
A: 当前图片发生了风灾灾害。可以看到大量树木倒伏、建筑受损,以及明显的风灾破坏痕迹。

Recommended System Prompt

For best results, use a system prompt such as:

You are a disaster recognition expert. When analyzing disaster images, first identify the disaster type, then explain the key visual evidence supporting your classification. Respond in the same language as the user.

Limitations

  • This model is specialized for post-disaster satellite and aerial imagery and may not perform well on ground-level photos.
  • The target label space is limited to six disaster classes: fire, flood, hurricane/wind, earthquake, tsunami, and volcano.
  • The training data format is relatively simple, so without a good system prompt the model may answer too briefly.
  • Geographic coverage is not uniform; performance may vary by region and disaster appearance.
  • Higher image resolution can improve fidelity, but it substantially increases TTFT on edge devices.
  • The model is primarily intended for English, Japanese, and Chinese.
  • This model is for assistance and triage, not fully autonomous decision-making in emergency response.

Intended Use

Recommended

  • post-disaster image triage
  • disaster type classification from satellite/aerial imagery
  • multilingual disaster-image QA
  • humanitarian and research workflows
  • edge deployment experiments on Jetson-class devices

Not Recommended

  • disaster prediction
  • ground-level scene understanding
  • legal, insurance, or policy decisions without human review
  • fine-grained damage severity assessment
  • use as the sole source of truth in emergency operations

License

This repository packages artifacts derived from multiple upstream sources:

Because these upstream artifacts use different licenses, the metadata of this repository is set to license: other rather than claiming a single simple license for all distributed artifacts.

Before using, redistributing, fine-tuning, or commercializing this repository, please:

  • review all upstream licenses yourself
  • confirm that your intended use is compatible with all applicable terms
  • avoid assuming that this repository grants rights beyond those granted by upstream authors and applicable law

If you need a single definitive legal statement for production or commercial use, obtain legal review first.

Citation

@misc{wang2026qwen3vl_jetson_disaster,
  title={Qwen3VL-8B-4bit-GGUF-Jetson-Deployment: Merged and Quantized Vision-Language Model for Disaster Type Classification},
  author={WayBob},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/WayBob/Qwen3VL-8B-4bit-GGUF-Jetson-Deployment}
}

@misc{wang2026qwen3vl_disaster_lora,
  title={Qwen3VL-8B-QLora-4bit-xView2-Disaster-Recognition},
  author={WayBob},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/WayBob/Qwen3VL-8B-QLora-4bit-xView2-Disaster-Recognition}
}

@misc{waybob2026disaster_dataset,
  title={Disaster Recognition RemoteSense Dataset (EN/CN/JA)},
  author={WayBob},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/datasets/WayBob/Disaster_Recognition_RemoteSense_EN_CN_JA}
}

@inproceedings{xview2,
  title={xBD: A Dataset for Assessing Building Damage from Satellite Imagery},
  author={Gupta, Ritwik and Hosfelt, Richard and Sajeev, Sandra and Patel, Nirav and Goodman, Bryce and Doshi, Jigar and Heim, Eric and Choset, Howie and Gaston, Matthew},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year={2019}
}

Acknowledgements

  • Qwen Team for the Qwen3-VL base model
  • LLaMA-Factory for the fine-tuning workflow
  • llama.cpp for efficient GGUF inference on edge devices
  • xView2 / xBD and DIUx for the original disaster imagery benchmark
  • NVIDIA Jetson platform for edge deployment validation

Disclaimer

This model is intended for research, evaluation, and deployment experimentation.
Always verify model outputs with qualified human reviewers before making real-world decisions in disaster response workflows.

Downloads last month
11
GGUF
Model size
8B params
Architecture
qwen3vl
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WayBob/Qwen3VL-8B-4bit-GGUF-Jetson-Deployment

Quantized
(75)
this model

Dataset used to train WayBob/Qwen3VL-8B-4bit-GGUF-Jetson-Deployment