JoyAI-Image-Edit (Safetensors)
This is a safetensors conversion of jdopensource/JoyAI-Image-Edit.
The original repo ships .pth (pickle) files for the transformer and VAE checkpoints. This repo replaces them with .safetensors files โ no arbitrary code execution, faster loading, same weights.
What changed
| File | Original | This repo |
|---|---|---|
transformer/transformer.pth (31 GB) |
pickle | transformer/transformer.safetensors |
vae/Wan2.1_VAE.pth (485 MB) |
pickle | vae/Wan2.1_VAE.safetensors |
JoyAI-Image-Und/ |
already safetensors | unchanged |
infer_config.py |
references .pth |
updated to reference .safetensors |
All tensors were verified for exact numerical equality after conversion.
Inference Tool
A Gradio UI, CLI, and REST API for running inference are available at SanDiegoDude/JoyAI-Image โ a handy way to run the model until proper ComfyUI integration lands. Features include auto-download from HuggingFace, multiple memory modes (offloading, high-VRAM, load-on-demand), and optional quantization for lower VRAM usage.
Quick start
git clone https://github.com/SanDiegoDude/JoyAI-Image.git
cd JoyAI-Image
python -m venv .venv && source .venv/bin/activate
pip install -e .
# Models auto-download on first run
# Default: FP8 transformer + 8-bit text encoder (offload mode)
python app.py
# Full precision (bf16 transformer + bf16 text encoder, needs ~48 GB+ VRAM)
python app.py --fullprecision --16bit-vlm --highvram
# Minimum VRAM (~13 GB active, fits RTX 4090)
python app.py --nf4-dit --4bit-vlm
# CLI inference
python inference.py \
--prompt "Turn the plate blue" \
--image test_images/test_1.jpg \
--output result.png \
--steps 18 --guidance-scale 4.0 --seed 42
# Headless REST API
python app.py --headless-api 7500 --nf4-dit --4bit-vlm
These bf16 safetensors are also the source weights for the --nf4-dit runtime NF4 quantization (~8 GB), which provides the best quality-to-size ratio for 4-bit since it quantizes directly from full-precision weights.
Also available
- FP8 quantized transformer โ 16 GB transformer, good balance of quality and VRAM
Model details
JoyAI-Image-Edit is an instruction-guided image editing model from JD's JoyAI-Image family. Architecture:
- Transformer: 16B-parameter MMDiT (40 double blocks, hidden size 4096)
- Text encoder: Qwen3VL 8B MLLM
- VAE: Wan2.1 causal 3D VAE (16 latent channels)
- Scheduler: FlowMatch discrete
See the original project page and paper for full details.
License
Apache 2.0, same as the original.
Model tree for SanDiegoDude/JoyAI-Image-Edit-Safetensors
Base model
jdopensource/JoyAI-Image-Edit