mlx-community/sapiens2-seg-0.4b-bf16

MLX port of facebook/sapiens2-seg-0.4b at bf16 (cast only, no quant) precision, converted with mlx-vlm.

Sapiens2 is a family of human-centric ViTs pretrained on 1B human images. This repo contains the seg head paired with the Sapiens2-0.4b backbone.

Install

pip install -U mlx-vlm

Usage โ€” body-part segmentation (29 classes)

from pathlib import Path
from PIL import Image
import numpy as np
from mlx_vlm.utils import load_model
from mlx_vlm.models.sapiens2.processing_sapiens2 import Sapiens2Processor
from mlx_vlm.models.sapiens2.generate import Sapiens2Predictor

model = load_model(Path("mlx-community/sapiens2-seg-0.4b-bf16"))
processor = Sapiens2Processor.from_pretrained("mlx-community/sapiens2-seg-0.4b-bf16")
predictor = Sapiens2Predictor(model, processor)

result = predictor.predict(Image.open("person.jpg"))
# result.mask        (orig_h, orig_w) int32 class indices
# result.seg_logits  (29, H_out, W_out) raw logits

print("active classes:", np.unique(result.mask).tolist())
Image.fromarray(result.mask.astype(np.uint8)).save("mask.png")

Output: dense 29-class body-part segmentation (DOME 29-class scheme โ€” face, hair, torso, arms/legs split left/right, etc.).

Convert your own checkpoint

# 1. Stage a float32 MLX directory from the Facebook checkpoint
python -m mlx_vlm.models.sapiens2.convert \
    --hf-repo facebook/sapiens2-seg-0.4b \
    --out ./sapiens2-seg-0.4b-fp32-mlx \
    --dtype float32

# 2. Quantize + upload via the main mlx_vlm.convert CLI
python -m mlx_vlm.convert \
    --hf-path  ./sapiens2-seg-0.4b-fp32-mlx \
    --mlx-path ./sapiens2-seg-0.4b-bf16 \
    --dtype bfloat16 \
    --upload-repo mlx-community/sapiens2-seg-0.4b-bf16

Architecture

Sapiens2 backbone: 2-D RoPE ViT (bf16 rope), partial GQA (full MHA in the first/last 8 blocks, KV-half for the middle), SwiGLU FFN, cls + 8 storage tokens. Default input: 1024 ร— 768 (H ร— W), patch size 16, ImageNet normalization on the [0, 255] scale.

See the mlx-vlm sapiens2 port for implementation details.

License

Weights released under the Sapiens2 License; this MLX repackaging inherits that license.

Citation

@article{khirodkarsapiens2,
  title  = {Sapiens2},
  author = {Khirodkar, Rawal and Wen, He and Martinez, Julieta and Dong, Yuan
            and Su, Zhaoen and Saito, Shunsuke},
  journal= {arXiv preprint arXiv:2604.21681},
  year   = {2026}
}
Downloads last month
58
Safetensors
Model size
0.4B params
Tensor type
BF16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mlx-community/sapiens2-seg-0.4b-bf16

Finetuned
(1)
this model

Paper for mlx-community/sapiens2-seg-0.4b-bf16