JoyAI-Image-Edit (Safetensors)

This is a safetensors conversion of jdopensource/JoyAI-Image-Edit.

The original repo ships .pth (pickle) files for the transformer and VAE checkpoints. This repo replaces them with .safetensors files — no arbitrary code execution, faster loading, same weights.

What changed

File	Original	This repo
`transformer/transformer.pth` (31 GB)	pickle	`transformer/transformer.safetensors`
`vae/Wan2.1_VAE.pth` (485 MB)	pickle	`vae/Wan2.1_VAE.safetensors`
`JoyAI-Image-Und/`	already safetensors	unchanged
`infer_config.py`	references `.pth`	updated to reference `.safetensors`

All tensors were verified for exact numerical equality after conversion.

Inference Tool

A Gradio UI, CLI, and REST API for running inference are available at SanDiegoDude/JoyAI-Image — a handy way to run the model until proper ComfyUI integration lands. Features include auto-download from HuggingFace, multiple memory modes (offloading, high-VRAM, load-on-demand), and optional quantization for lower VRAM usage.

Quick start

git clone https://github.com/SanDiegoDude/JoyAI-Image.git
cd JoyAI-Image
python -m venv .venv && source .venv/bin/activate
pip install -e .

# Models auto-download on first run
# Default: FP8 transformer + 8-bit text encoder (offload mode)
python app.py

# Full precision (bf16 transformer + bf16 text encoder, needs ~48 GB+ VRAM)
python app.py --fullprecision --16bit-vlm --highvram

# Minimum VRAM (~13 GB active, fits RTX 4090)
python app.py --nf4-dit --4bit-vlm

# CLI inference
python inference.py \
  --prompt "Turn the plate blue" \
  --image test_images/test_1.jpg \
  --output result.png \
  --steps 18 --guidance-scale 4.0 --seed 42

# Headless REST API
python app.py --headless-api 7500 --nf4-dit --4bit-vlm

These bf16 safetensors are also the source weights for the --nf4-dit runtime NF4 quantization (~8 GB), which provides the best quality-to-size ratio for 4-bit since it quantizes directly from full-precision weights.

Also available

FP8 quantized transformer — 16 GB transformer, good balance of quality and VRAM

Model details

JoyAI-Image-Edit is an instruction-guided image editing model from JD's JoyAI-Image family. Architecture:

Transformer: 16B-parameter MMDiT (40 double blocks, hidden size 4096)
Text encoder: Qwen3VL 8B MLLM
VAE: Wan2.1 causal 3D VAE (16 latent channels)
Scheduler: FlowMatch discrete

See the original project page and paper for full details.

License

Apache 2.0, same as the original.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for SanDiegoDude/JoyAI-Image-Edit-Safetensors

Base model

jdopensource/JoyAI-Image-Edit

Finetuned

(4)

this model