JoyAI-Image-Edit (Safetensors)

This is a safetensors conversion of jdopensource/JoyAI-Image-Edit.

The original repo ships .pth (pickle) files for the transformer and VAE checkpoints. This repo replaces them with .safetensors files โ€” no arbitrary code execution, faster loading, same weights.

What changed

File Original This repo
transformer/transformer.pth (31 GB) pickle transformer/transformer.safetensors
vae/Wan2.1_VAE.pth (485 MB) pickle vae/Wan2.1_VAE.safetensors
JoyAI-Image-Und/ already safetensors unchanged
infer_config.py references .pth updated to reference .safetensors

All tensors were verified for exact numerical equality after conversion.

Inference Tool

A Gradio UI, CLI, and REST API for running inference are available at SanDiegoDude/JoyAI-Image โ€” a handy way to run the model until proper ComfyUI integration lands. Features include auto-download from HuggingFace, multiple memory modes (offloading, high-VRAM, load-on-demand), and optional quantization for lower VRAM usage.

Quick start

git clone https://github.com/SanDiegoDude/JoyAI-Image.git
cd JoyAI-Image
python -m venv .venv && source .venv/bin/activate
pip install -e .

# Models auto-download on first run
# Default: FP8 transformer + 8-bit text encoder (offload mode)
python app.py

# Full precision (bf16 transformer + bf16 text encoder, needs ~48 GB+ VRAM)
python app.py --fullprecision --16bit-vlm --highvram

# Minimum VRAM (~13 GB active, fits RTX 4090)
python app.py --nf4-dit --4bit-vlm

# CLI inference
python inference.py \
  --prompt "Turn the plate blue" \
  --image test_images/test_1.jpg \
  --output result.png \
  --steps 18 --guidance-scale 4.0 --seed 42

# Headless REST API
python app.py --headless-api 7500 --nf4-dit --4bit-vlm

These bf16 safetensors are also the source weights for the --nf4-dit runtime NF4 quantization (~8 GB), which provides the best quality-to-size ratio for 4-bit since it quantizes directly from full-precision weights.

Also available

Model details

JoyAI-Image-Edit is an instruction-guided image editing model from JD's JoyAI-Image family. Architecture:

  • Transformer: 16B-parameter MMDiT (40 double blocks, hidden size 4096)
  • Text encoder: Qwen3VL 8B MLLM
  • VAE: Wan2.1 causal 3D VAE (16 latent channels)
  • Scheduler: FlowMatch discrete

See the original project page and paper for full details.

License

Apache 2.0, same as the original.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Model tree for SanDiegoDude/JoyAI-Image-Edit-Safetensors

Finetuned
(4)
this model