OracleRM ModernBERT Base v2

OracleRM ModernBERT Base v2 is a lightweight reward model for ranking written text. It scores a candidate using two heads:

  • style: how strongly the response matches the target literary/stylistic preference.
  • faith: how well the response preserves the meaning of the source prompt.

The model is built on top of answerdotai/ModernBERT-base with two scalar classification heads. It is intended for reranking multiple candidate rewrites, not for text generation.

Files expected in this repo

This repo should contain the exported files from the training zip:

config.json
metadata.json
model_state_dict.pt
tokenizer.json / tokenizer files
special_tokens_map.json
tokenizer_config.json
vocab or tokenizer model files, depending on tokenizer export

The model is not saved as a standard AutoModelForSequenceClassification checkpoint. Load it by reconstructing the wrapper class, then loading model_state_dict.pt.

Input format

The model was trained with this text format:

### Source:
{prompt}

### Rewrite:
{response}

Use an empty source when scoring style only:

### Source:


### Rewrite:
{response}

Raw scores

The model returns two sigmoid scores:

style = sigmoid(style_logit)
faith = sigmoid(faith_logit)

A simple score is:

score = style * faith

For practical reranking, a softer faith-weighted score often works better:

score = style * (faith ** 0.65)

You may also apply a control penalty for candidates that are too short, too verbose, or unnecessarily hard to read.

Example usage

import json
from pathlib import Path

import torch
import torch.nn as nn
from huggingface_hub import snapshot_download
from transformers import AutoConfig, AutoTokenizer, AutoModel

HF_REPO_ID = "YOUR_USERNAME/oracle-rm-modernbert-base-v2"

local_dir = Path(snapshot_download(HF_REPO_ID))

device = "cuda" if torch.cuda.is_available() else "cpu"

metadata = json.loads((local_dir / "metadata.json").read_text())
MAX_LENGTH = int(metadata.get("max_length", 1024))

tokenizer = AutoTokenizer.from_pretrained(local_dir)
config = AutoConfig.from_pretrained(local_dir)

class TwoHeadRM(nn.Module):
    def __init__(self, config):
        super().__init__()
        try:
            self.backbone = AutoModel.from_config(config, attn_implementation="sdpa")
        except TypeError:
            self.backbone = AutoModel.from_config(config)

        hidden = self.backbone.config.hidden_size
        self.dropout = nn.Dropout(0.1)
        self.style_head = nn.Linear(hidden, 1)
        self.faith_head = nn.Linear(hidden, 1)

    def forward(self, input_ids, attention_mask):
        out = self.backbone(input_ids=input_ids, attention_mask=attention_mask)
        pooled = out.last_hidden_state[:, 0]
        pooled = self.dropout(pooled)
        style_logit = self.style_head(pooled).squeeze(-1)
        faith_logit = self.faith_head(pooled).squeeze(-1)
        return style_logit, faith_logit

def safe_torch_load(path):
    try:
        return torch.load(path, map_location="cpu", weights_only=True)
    except TypeError:
        return torch.load(path, map_location="cpu")

model = TwoHeadRM(config)
model.load_state_dict(safe_torch_load(local_dir / "model_state_dict.pt"), strict=True)
model.to(device)
model.eval()

def format_input(prompt, response):
    return f"### Source:\n{prompt}\n\n### Rewrite:\n{response}"

@torch.inference_mode()
def score_one(prompt, response):
    text = format_input(prompt, response)

    enc = tokenizer(
        text,
        max_length=MAX_LENGTH,
        padding=True,
        truncation=True,
        return_tensors="pt",
    ).to(device)

    style_logit, faith_logit = model(enc["input_ids"], enc["attention_mask"])

    style = torch.sigmoid(style_logit.float()).item()
    faith = torch.sigmoid(faith_logit.float()).item()

    return {
        "style": style,
        "faith": faith,
        "score": style * (faith ** 0.65),
    }

prompt = "The room was silent and time seemed to move slowly."

candidate = "In that room, time sank to the bottom like sediment."

print(score_one(prompt, candidate))

Intended use

Use this model to:

  • rerank candidate rewrites;
  • select more literary or stylistically strong generations;
  • filter outputs that drift too far from the source meaning;
  • compare rewrite candidates during dataset curation.

This model is best used as a reranker after a generator has already produced several candidates.

Not intended for

This model should not be used as a general factuality verifier, safety classifier, plagiarism detector, or universal writing-quality metric. It reflects the style preferences and contrastive examples used during training.

Scoring guidance

For rewrite ranking, sort candidates by:

adjusted_score = style * (faith ** 0.65) * control_penalty

Where control_penalty may include:

  • prompt-relative length limits;
  • verbosity penalty only outside the allowed length band;
  • readability penalty for overly dense or hard-to-read prose.

For style-only ranking, use:

style

For meaning preservation checks, inspect:

faith

Do not rely only on the final composite score. For debugging, always print style, faith, and the final adjusted score separately.

Limitations

  • The model may over-reward ornate or elaborate prose if verbosity is not controlled.
  • Very short inputs may produce unstable style judgments.
  • Faithfulness is a learned similarity/preference signal, not a factual guarantee.
  • The model was trained for English prose-style rewriting and may not transfer well to other languages or technical domains.
  • Scores are relative and are most useful when comparing multiple candidates for the same prompt.

Suggested implementation pattern

Generate several rewrite candidates, score each candidate with this RM, then choose the highest adjusted score.

candidates = [
    "The room was quiet and time felt slow.",
    "In that room, time sank to the bottom like sediment.",
]

rows = []

for response in candidates:
    scores = score_one(prompt, response)
    rows.append((scores["score"], scores["style"], scores["faith"], response))

rows.sort(reverse=True)

for score, style, faith, response in rows:
    print(f"score={score:.3f} style={style:.3f} faith={faith:.3f} | {response}")
Downloads last month
63
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 3rd-Degree-Burn/modernbert-stylefaith-rm-v2

Finetuned
(1273)
this model