wmt22-comet-da-pruned-k4-refit
A compressed version of Unbabel/wmt22-comet-da produced by an internal pipeline that removes redundant encoder capacity and re-aligns the layer-mixing weights afterwards. Same disk footprint as the -xs variant, higher quality.
Real-world benchmark
Evaluated on 1,200 WMT17 DA segments across 12 language pairs (RicardoRei/wmt-da-human-evaluation). This is the standard way to judge COMET variants: how well do the model's scores correlate with human quality judgments?
| Model | Disk | Pearson (human) | Drop vs full |
|---|---|---|---|
Original wmt22-comet-da |
2200 MB | 0.6415 | โ |
| pruned-k4 (fp32, small cut) | 2122 MB | 0.6181 | โ0.023 |
| this model (refit, 1.1 GB) | 1061 MB | 0.6004 | โ0.041 |
| pruned-k4-xs (no refit) | 1061 MB | 0.5730 | โ0.069 |
The refit variant recovers roughly 40 % of the quality loss incurred by -xs at the same disk size.
Usage
pip install "unbabel-comet" "setuptools<81" huggingface_hub
from huggingface_hub import snapshot_download
import sys
folder = snapshot_download(repo_id="solailabs/wmt22-comet-da-pruned-k4-refit")
sys.path.insert(0, folder)
from load import load_model
model = load_model()
scores = model.predict(
[{"src": "Hello world.", "mt": "Bonjour le monde.", "ref": "Bonjour le monde."}],
batch_size=8, gpus=0, num_workers=2,
)
print(scores["scores"])
The bundled load.py downloads the base model on first use, applies internal compression, and returns a working COMET model. First call is slow (base download ~2.2 GB cached); subsequent calls are fast.
Notes
- No fine-tuning was performed. Only post-hoc structural changes + a short (CPU-only) calibration on a small multilingual sample.
- Tested on Apple M-series (qnnpack quant engine) and x86 Linux (fbgemm).
- Behavior outside the languages listed above is not guaranteed.
License & attribution
Apache-2.0 (inherited from the base model).
Base model: Unbabel/wmt22-comet-da by Unbabel.
@inproceedings{rei-etal-2022-comet,
title={{COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task}},
author={Rei, Ricardo and others},
booktitle={Proceedings of the Seventh Conference on Machine Translation},
year={2022}
}
- Downloads last month
- -