File size: 5,707 Bytes
7591bf1 81dd3f4 7591bf1 81dd3f4 7591bf1 f0b3885 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | ---
license: apache-2.0
license_name: apache-2.0
license_link: https://www.apache.org/licenses/LICENSE-2.0
library_name: transformers
pipeline_tag: image-classification
tags:
- image-text-to-text
- video-text-to-text
- hpsv3
- multimodal
- qwen3
language:
- en
---
# LibreHPS-4B β v1.1
LibreHPS looks at an AI-generated picture (or short video) and a text
prompt and tells you how good a match they are, the way a human would
rate it. You give it a prompt like *"a cat sitting on a windowsill"*
and an image, and it gives you back a score. You can also hand it two
images and ask which one a person would prefer.
This is the kind of model you reach for when you want to automatically
rank or filter the output of an image or video generator the way a
human reviewer would β picking the best of N samples, training another
generator with reinforcement learning from feedback, or running a
benchmark that doesn't need human raters in the loop. It scores along
five separate axes (overall quality, prompt alignment, visual
coherence, style, and β for video β how natural the motion looks).
It's open source under Apache-2.0 and was trained only on
freely-licensed preference data.
| | |
|---|---|
| **Size** | 4B parameters (dense) |
| **Backbone** | `Qwen3_5ForConditionalGeneration` (Qwen3.5-4B-Base, hybrid Mamba2 + full attention, MRoPE) |
| **Reward axes** | `overall`, `alignment`, `coherence`, `style`, `temporal` (video only) |
| **Weights licence** | Apache-2.0 ([`LICENSE`](LICENSE)) |
| **Dataset mix** | permissive-only β see [`DATA_PROVENANCE.md`](DATA_PROVENANCE.md) |
| **Container** | sharded `safetensors`, ~20.7 GB total |
## Install
```bash
pip install librehps
```
The reward heads (`multi_axis_head`, `scalar_head`) sit on top of the
stock-transformers Qwen3.5 backbone and need the `librehps` Python
package to load. Loading the safetensors with stock `transformers`
alone will give you the backbone but silently drop the reward heads β
you'll get a working LM, but no reward scores.
## Quickstart β score an image
```python
from librehps import LibreHPS
scorer = LibreHPS.from_pretrained("LibreHPS/LibreHPS-4B-v1.1")
result = scorer.score_image(image="photo.png", prompt="a cat sitting on a windowsill")
print(result.overall.mean)
```
## Quickstart β compare two images
```python
from librehps import LibreHPS
from librehps.evaluate.calibration import load_platt_calibrator
scorer = LibreHPS.from_pretrained("LibreHPS/LibreHPS-4B-v1.1")
cal = load_platt_calibrator("calibration.json", benchmark=None).default
result = scorer.compare_images(
image_a="left.png",
image_b="right.png",
prompt="a cat sitting on a windowsill",
calibrator=cal,
)
print(result.winner, result.probability)
```
The `calibrator=` keyword is optional. With it, you get the calibrated
win probability shipped in [`calibration.json`](calibration.json) (a
2-parameter logistic fit per benchmark, plus a `<global>` fit you can
use for live / unknown-benchmark scoring). Without it, you get the
uncalibrated ΞΌ-space probability.
## Hardware and attention backend
By default, `LibreHPS.from_pretrained(...)` looks for **Flash
Attention 4** (CUDA Blackwell, sm_100). If FA4 is available it's used
as a fast path; if not, the loader logs a warning and falls back to
stock Qwen3.5 attention (SDPA on modern PyTorch, eager on CPU). You
can pin the backend explicitly:
```python
LibreHPS.from_pretrained(path) # auto (default)
LibreHPS.from_pretrained(path, attn_impl="fa4") # require FA4 + Blackwell
LibreHPS.from_pretrained(path, attn_impl="sdpa") # skip FA4; CUDA / CPU / MPS all OK
```
`bfloat16` weights, ~20.7 GB on disk, ~10 GB resident at bf16. Fits on
a single 24 GB GPU.
## Limitations
- Trained on permissively-licensed generator outputs only. Closed-model
outputs (Midjourney, most OpenAI images) were intentionally excluded
from training, so scores on those generators are out-of-distribution.
- Not a content-moderation model. There's no toxicity / safety filter
beyond the upstream dataset licence audits.
- The `temporal` axis was trained on 5β8 subsampled frames; longer
clips are extrapolation.
- The model's per-axis Ο output is **uninformative** on this checkpoint
(median Ο β 0.05). Use the calibrated `probability` field for
uncertainty, not `ScoreAxis.sigma`.
## Evaluation
Held-out 90 % per benchmark, deterministic by `global_index` hash.
Symmetrised (live-harness-equivalent) numbers:
| Benchmark | n | pair_acc | ECE (uncal) | ECE (cal) | Brier (uncal) | Brier (cal) |
|---|---|---|---|---|---|---|
| `hpdv3` | 25 818 | 0.922 | 0.068 | **0.011** | 0.072 | 0.054 |
| `vrr` | 37 244 | 0.764 | 0.221 | **0.011** | 0.225 | 0.156 |
| `imgrew` | 11 460 | 0.647 | 0.333 | **0.008** | 0.338 | 0.216 |
| `pickscore` | 780 | 0.623 | 0.368 | **0.041** | 0.371 | 0.236 |
**Aggregate (n-weighted):** ECE `0.187 β 0.011` (β94 %); Brier `0.191
β 0.131` (β32 %); pair_acc unchanged. The calibrator is the difference
between the "uncal" and "cal" columns.
## Acknowledgement
LibreHPS is inspired by and architecturally influenced by **HPSv3**
(Ma, Shui, Wu, Sun, Li β ICCV 2025). LibreHPS is a from-scratch
reimplementation with a different backbone, training stack, and
permissively-licensed training data mix.
# License
- **Source code:** MIT
- **Model weights:** Apache-2.0
- **Training data:** permissive union (MIT / Apache-2.0 / BSD-3-Clause / CDLA-Permissive-2.0). See DATA_PROVENANCE.md for the per-dataset audit and the per-image generator-redistribution audit applied to filter the training mix.
*Copyright Β© 2026 Jeff Moe <moe@spacecruft.org>.*
Loveland, Colorado, USA
|