Jeff Moe

Add license info

f0b3885 4 days ago

5.71 kB

	---
	license: apache-2.0
	license_name: apache-2.0
	license_link: https://www.apache.org/licenses/LICENSE-2.0
	library_name: transformers
	pipeline_tag: image-classification
	tags:
	- image-text-to-text
	- video-text-to-text
	- hpsv3
	- multimodal
	- qwen3
	language:
	- en
	---

	# LibreHPS-4B — v1.1

	LibreHPS looks at an AI-generated picture (or short video) and a text
	prompt and tells you how good a match they are, the way a human would
	rate it. You give it a prompt like "a cat sitting on a windowsill"
	and an image, and it gives you back a score. You can also hand it two
	images and ask which one a person would prefer.

	This is the kind of model you reach for when you want to automatically
	rank or filter the output of an image or video generator the way a
	human reviewer would — picking the best of N samples, training another
	generator with reinforcement learning from feedback, or running a
	benchmark that doesn't need human raters in the loop. It scores along
	five separate axes (overall quality, prompt alignment, visual
	coherence, style, and — for video — how natural the motion looks).

	It's open source under Apache-2.0 and was trained only on
	freely-licensed preference data.

	\| \| \|
	\|---\|---\|
	\| Size \| 4B parameters (dense) \|
	\| Backbone \| `Qwen3_5ForConditionalGeneration` (Qwen3.5-4B-Base, hybrid Mamba2 + full attention, MRoPE) \|
	\| Reward axes \| `overall`, `alignment`, `coherence`, `style`, `temporal` (video only) \|
	\| Weights licence \| Apache-2.0 ([`LICENSE`](LICENSE)) \|
	\| Dataset mix \| permissive-only — see [`DATA_PROVENANCE.md`](DATA_PROVENANCE.md) \|
	\| Container \| sharded `safetensors`, ~20.7 GB total \|

	## Install

	```bash
	pip install librehps
	```

	The reward heads (`multi_axis_head`, `scalar_head`) sit on top of the
	stock-transformers Qwen3.5 backbone and need the `librehps` Python
	package to load. Loading the safetensors with stock `transformers`
	alone will give you the backbone but silently drop the reward heads —
	you'll get a working LM, but no reward scores.

	## Quickstart — score an image

	```python
	from librehps import LibreHPS

	scorer = LibreHPS.from_pretrained("LibreHPS/LibreHPS-4B-v1.1")
	result = scorer.score_image(image="photo.png", prompt="a cat sitting on a windowsill")
	print(result.overall.mean)
	```

	## Quickstart — compare two images

	```python
	from librehps import LibreHPS
	from librehps.evaluate.calibration import load_platt_calibrator

	scorer = LibreHPS.from_pretrained("LibreHPS/LibreHPS-4B-v1.1")
	cal = load_platt_calibrator("calibration.json", benchmark=None).default

	result = scorer.compare_images(
	image_a="left.png",
	image_b="right.png",
	prompt="a cat sitting on a windowsill",
	calibrator=cal,
	)
	print(result.winner, result.probability)
	```

	The `calibrator=` keyword is optional. With it, you get the calibrated
	win probability shipped in [`calibration.json`](calibration.json) (a
	2-parameter logistic fit per benchmark, plus a `<global>` fit you can
	use for live / unknown-benchmark scoring). Without it, you get the
	uncalibrated μ-space probability.

	## Hardware and attention backend

	By default, `LibreHPS.from_pretrained(...)` looks for **Flash
	Attention 4** (CUDA Blackwell, sm_100). If FA4 is available it's used
	as a fast path; if not, the loader logs a warning and falls back to
	stock Qwen3.5 attention (SDPA on modern PyTorch, eager on CPU). You
	can pin the backend explicitly:

	```python
	LibreHPS.from_pretrained(path) # auto (default)
	LibreHPS.from_pretrained(path, attn_impl="fa4") # require FA4 + Blackwell
	LibreHPS.from_pretrained(path, attn_impl="sdpa") # skip FA4; CUDA / CPU / MPS all OK
	```

	`bfloat16` weights, ~20.7 GB on disk, ~10 GB resident at bf16. Fits on
	a single 24 GB GPU.

	## Limitations

	- Trained on permissively-licensed generator outputs only. Closed-model
	outputs (Midjourney, most OpenAI images) were intentionally excluded
	from training, so scores on those generators are out-of-distribution.
	- Not a content-moderation model. There's no toxicity / safety filter
	beyond the upstream dataset licence audits.
	- The `temporal` axis was trained on 5–8 subsampled frames; longer
	clips are extrapolation.
	- The model's per-axis σ output is uninformative on this checkpoint
	(median σ ≈ 0.05). Use the calibrated `probability` field for
	uncertainty, not `ScoreAxis.sigma`.

	## Evaluation

	Held-out 90 % per benchmark, deterministic by `global_index` hash.
	Symmetrised (live-harness-equivalent) numbers:

	\| Benchmark \| n \| pair_acc \| ECE (uncal) \| ECE (cal) \| Brier (uncal) \| Brier (cal) \|
	\|---\|---\|---\|---\|---\|---\|---\|
	\| `hpdv3` \| 25 818 \| 0.922 \| 0.068 \| 0.011 \| 0.072 \| 0.054 \|
	\| `vrr` \| 37 244 \| 0.764 \| 0.221 \| 0.011 \| 0.225 \| 0.156 \|
	\| `imgrew` \| 11 460 \| 0.647 \| 0.333 \| 0.008 \| 0.338 \| 0.216 \|
	\| `pickscore` \| 780 \| 0.623 \| 0.368 \| 0.041 \| 0.371 \| 0.236 \|

	Aggregate (n-weighted): ECE `0.187 → 0.011` (−94 %); Brier `0.191
	→ 0.131` (−32 %); pair_acc unchanged. The calibrator is the difference
	between the "uncal" and "cal" columns.

	## Acknowledgement

	LibreHPS is inspired by and architecturally influenced by HPSv3
	(Ma, Shui, Wu, Sun, Li — ICCV 2025). LibreHPS is a from-scratch
	reimplementation with a different backbone, training stack, and
	permissively-licensed training data mix.

	# License
	- Source code: MIT
	- Model weights: Apache-2.0
	- Training data: permissive union (MIT / Apache-2.0 / BSD-3-Clause / CDLA-Permissive-2.0). See DATA_PROVENANCE.md for the per-dataset audit and the per-image generator-redistribution audit applied to filter the training mix.

	Copyright © 2026 Jeff Moe <moe@spacecruft.org>.

	Loveland, Colorado, USA