How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-classification", model="deepcrayon/LibreHPS-4B-v1.1")
pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")
# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("deepcrayon/LibreHPS-4B-v1.1")
model = AutoModelForImageTextToText.from_pretrained("deepcrayon/LibreHPS-4B-v1.1")
Quick Links

LibreHPS-4B β€” v1.1

LibreHPS looks at an AI-generated picture (or short video) and a text prompt and tells you how good a match they are, the way a human would rate it. You give it a prompt like "a cat sitting on a windowsill" and an image, and it gives you back a score. You can also hand it two images and ask which one a person would prefer.

This is the kind of model you reach for when you want to automatically rank or filter the output of an image or video generator the way a human reviewer would β€” picking the best of N samples, training another generator with reinforcement learning from feedback, or running a benchmark that doesn't need human raters in the loop. It scores along five separate axes (overall quality, prompt alignment, visual coherence, style, and β€” for video β€” how natural the motion looks).

It's open source under Apache-2.0 and was trained only on freely-licensed preference data.

Size 4B parameters (dense)
Backbone Qwen3_5ForConditionalGeneration (Qwen3.5-4B-Base, hybrid Mamba2 + full attention, MRoPE)
Reward axes overall, alignment, coherence, style, temporal (video only)
Weights licence Apache-2.0 (LICENSE)
Dataset mix permissive-only β€” see DATA_PROVENANCE.md
Container sharded safetensors, ~20.7 GB total

Install

pip install librehps

The reward heads (multi_axis_head, scalar_head) sit on top of the stock-transformers Qwen3.5 backbone and need the librehps Python package to load. Loading the safetensors with stock transformers alone will give you the backbone but silently drop the reward heads β€” you'll get a working LM, but no reward scores.

Quickstart β€” score an image

from librehps import LibreHPS

scorer = LibreHPS.from_pretrained("LibreHPS/LibreHPS-4B-v1.1")
result = scorer.score_image(image="photo.png", prompt="a cat sitting on a windowsill")
print(result.overall.mean)

Quickstart β€” compare two images

from librehps import LibreHPS
from librehps.evaluate.calibration import load_platt_calibrator

scorer = LibreHPS.from_pretrained("LibreHPS/LibreHPS-4B-v1.1")
cal = load_platt_calibrator("calibration.json", benchmark=None).default

result = scorer.compare_images(
    image_a="left.png",
    image_b="right.png",
    prompt="a cat sitting on a windowsill",
    calibrator=cal,
)
print(result.winner, result.probability)

The calibrator= keyword is optional. With it, you get the calibrated win probability shipped in calibration.json (a 2-parameter logistic fit per benchmark, plus a <global> fit you can use for live / unknown-benchmark scoring). Without it, you get the uncalibrated ΞΌ-space probability.

Hardware and attention backend

By default, LibreHPS.from_pretrained(...) looks for Flash Attention 4 (CUDA Blackwell, sm_100). If FA4 is available it's used as a fast path; if not, the loader logs a warning and falls back to stock Qwen3.5 attention (SDPA on modern PyTorch, eager on CPU). You can pin the backend explicitly:

LibreHPS.from_pretrained(path)                       # auto (default)
LibreHPS.from_pretrained(path, attn_impl="fa4")      # require FA4 + Blackwell
LibreHPS.from_pretrained(path, attn_impl="sdpa")     # skip FA4; CUDA / CPU / MPS all OK

bfloat16 weights, ~20.7 GB on disk, ~10 GB resident at bf16. Fits on a single 24 GB GPU.

Limitations

  • Trained on permissively-licensed generator outputs only. Closed-model outputs (Midjourney, most OpenAI images) were intentionally excluded from training, so scores on those generators are out-of-distribution.
  • Not a content-moderation model. There's no toxicity / safety filter beyond the upstream dataset licence audits.
  • The temporal axis was trained on 5–8 subsampled frames; longer clips are extrapolation.
  • The model's per-axis Οƒ output is uninformative on this checkpoint (median Οƒ β‰ˆ 0.05). Use the calibrated probability field for uncertainty, not ScoreAxis.sigma.

Evaluation

Held-out 90 % per benchmark, deterministic by global_index hash. Symmetrised (live-harness-equivalent) numbers:

Benchmark n pair_acc ECE (uncal) ECE (cal) Brier (uncal) Brier (cal)
hpdv3 25 818 0.922 0.068 0.011 0.072 0.054
vrr 37 244 0.764 0.221 0.011 0.225 0.156
imgrew 11 460 0.647 0.333 0.008 0.338 0.216
pickscore 780 0.623 0.368 0.041 0.371 0.236

Aggregate (n-weighted): ECE 0.187 β†’ 0.011 (βˆ’94 %); Brier 0.191 β†’ 0.131 (βˆ’32 %); pair_acc unchanged. The calibrator is the difference between the "uncal" and "cal" columns.

Acknowledgement

LibreHPS is inspired by and architecturally influenced by HPSv3 (Ma, Shui, Wu, Sun, Li β€” ICCV 2025). LibreHPS is a from-scratch reimplementation with a different backbone, training stack, and permissively-licensed training data mix.

License

  • Source code: MIT
  • Model weights: Apache-2.0
  • Training data: permissive union (MIT / Apache-2.0 / BSD-3-Clause / CDLA-Permissive-2.0). See DATA_PROVENANCE.md for the per-dataset audit and the per-image generator-redistribution audit applied to filter the training mix.

Copyright Β© 2026 Jeff Moe moe@spacecruft.org.

Loveland, Colorado, USA

Downloads last month
37
Safetensors
Model size
5B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support