| --- |
| license: apache-2.0 |
| license_name: apache-2.0 |
| license_link: https://www.apache.org/licenses/LICENSE-2.0 |
| library_name: transformers |
| pipeline_tag: image-classification |
| tags: |
| - image-text-to-text |
| - video-text-to-text |
| - hpsv3 |
| - multimodal |
| - qwen3 |
| language: |
| - en |
| --- |
| |
| # LibreHPS-4B β v1.1 |
|
|
| LibreHPS looks at an AI-generated picture (or short video) and a text |
| prompt and tells you how good a match they are, the way a human would |
| rate it. You give it a prompt like *"a cat sitting on a windowsill"* |
| and an image, and it gives you back a score. You can also hand it two |
| images and ask which one a person would prefer. |
|
|
| This is the kind of model you reach for when you want to automatically |
| rank or filter the output of an image or video generator the way a |
| human reviewer would β picking the best of N samples, training another |
| generator with reinforcement learning from feedback, or running a |
| benchmark that doesn't need human raters in the loop. It scores along |
| five separate axes (overall quality, prompt alignment, visual |
| coherence, style, and β for video β how natural the motion looks). |
|
|
| It's open source under Apache-2.0 and was trained only on |
| freely-licensed preference data. |
|
|
| | | | |
| |---|---| |
| | **Size** | 4B parameters (dense) | |
| | **Backbone** | `Qwen3_5ForConditionalGeneration` (Qwen3.5-4B-Base, hybrid Mamba2 + full attention, MRoPE) | |
| | **Reward axes** | `overall`, `alignment`, `coherence`, `style`, `temporal` (video only) | |
| | **Weights licence** | Apache-2.0 ([`LICENSE`](LICENSE)) | |
| | **Dataset mix** | permissive-only β see [`DATA_PROVENANCE.md`](DATA_PROVENANCE.md) | |
| | **Container** | sharded `safetensors`, ~20.7 GB total | |
|
|
| ## Install |
|
|
| ```bash |
| pip install librehps |
| ``` |
|
|
| The reward heads (`multi_axis_head`, `scalar_head`) sit on top of the |
| stock-transformers Qwen3.5 backbone and need the `librehps` Python |
| package to load. Loading the safetensors with stock `transformers` |
| alone will give you the backbone but silently drop the reward heads β |
| you'll get a working LM, but no reward scores. |
|
|
| ## Quickstart β score an image |
|
|
| ```python |
| from librehps import LibreHPS |
| |
| scorer = LibreHPS.from_pretrained("LibreHPS/LibreHPS-4B-v1.1") |
| result = scorer.score_image(image="photo.png", prompt="a cat sitting on a windowsill") |
| print(result.overall.mean) |
| ``` |
|
|
| ## Quickstart β compare two images |
|
|
| ```python |
| from librehps import LibreHPS |
| from librehps.evaluate.calibration import load_platt_calibrator |
| |
| scorer = LibreHPS.from_pretrained("LibreHPS/LibreHPS-4B-v1.1") |
| cal = load_platt_calibrator("calibration.json", benchmark=None).default |
| |
| result = scorer.compare_images( |
| image_a="left.png", |
| image_b="right.png", |
| prompt="a cat sitting on a windowsill", |
| calibrator=cal, |
| ) |
| print(result.winner, result.probability) |
| ``` |
|
|
| The `calibrator=` keyword is optional. With it, you get the calibrated |
| win probability shipped in [`calibration.json`](calibration.json) (a |
| 2-parameter logistic fit per benchmark, plus a `<global>` fit you can |
| use for live / unknown-benchmark scoring). Without it, you get the |
| uncalibrated ΞΌ-space probability. |
|
|
| ## Hardware and attention backend |
|
|
| By default, `LibreHPS.from_pretrained(...)` looks for **Flash |
| Attention 4** (CUDA Blackwell, sm_100). If FA4 is available it's used |
| as a fast path; if not, the loader logs a warning and falls back to |
| stock Qwen3.5 attention (SDPA on modern PyTorch, eager on CPU). You |
| can pin the backend explicitly: |
| |
| ```python |
| LibreHPS.from_pretrained(path) # auto (default) |
| LibreHPS.from_pretrained(path, attn_impl="fa4") # require FA4 + Blackwell |
| LibreHPS.from_pretrained(path, attn_impl="sdpa") # skip FA4; CUDA / CPU / MPS all OK |
| ``` |
| |
| `bfloat16` weights, ~20.7 GB on disk, ~10 GB resident at bf16. Fits on |
| a single 24 GB GPU. |
| |
| ## Limitations |
| |
| - Trained on permissively-licensed generator outputs only. Closed-model |
| outputs (Midjourney, most OpenAI images) were intentionally excluded |
| from training, so scores on those generators are out-of-distribution. |
| - Not a content-moderation model. There's no toxicity / safety filter |
| beyond the upstream dataset licence audits. |
| - The `temporal` axis was trained on 5β8 subsampled frames; longer |
| clips are extrapolation. |
| - The model's per-axis Ο output is **uninformative** on this checkpoint |
| (median Ο β 0.05). Use the calibrated `probability` field for |
| uncertainty, not `ScoreAxis.sigma`. |
| |
| ## Evaluation |
| |
| Held-out 90 % per benchmark, deterministic by `global_index` hash. |
| Symmetrised (live-harness-equivalent) numbers: |
| |
| | Benchmark | n | pair_acc | ECE (uncal) | ECE (cal) | Brier (uncal) | Brier (cal) | |
| |---|---|---|---|---|---|---| |
| | `hpdv3` | 25 818 | 0.922 | 0.068 | **0.011** | 0.072 | 0.054 | |
| | `vrr` | 37 244 | 0.764 | 0.221 | **0.011** | 0.225 | 0.156 | |
| | `imgrew` | 11 460 | 0.647 | 0.333 | **0.008** | 0.338 | 0.216 | |
| | `pickscore` | 780 | 0.623 | 0.368 | **0.041** | 0.371 | 0.236 | |
| |
| **Aggregate (n-weighted):** ECE `0.187 β 0.011` (β94 %); Brier `0.191 |
| β 0.131` (β32 %); pair_acc unchanged. The calibrator is the difference |
| between the "uncal" and "cal" columns. |
| |
| ## Acknowledgement |
| |
| LibreHPS is inspired by and architecturally influenced by **HPSv3** |
| (Ma, Shui, Wu, Sun, Li β ICCV 2025). LibreHPS is a from-scratch |
| reimplementation with a different backbone, training stack, and |
| permissively-licensed training data mix. |
| |
| # License |
| - **Source code:** MIT |
| - **Model weights:** Apache-2.0 |
| - **Training data:** permissive union (MIT / Apache-2.0 / BSD-3-Clause / CDLA-Permissive-2.0). See DATA_PROVENANCE.md for the per-dataset audit and the per-image generator-redistribution audit applied to filter the training mix. |
| |
| *Copyright Β© 2026 Jeff Moe <moe@spacecruft.org>.* |
| |
| Loveland, Colorado, USA |
| |