feed-classifier

A multilingual feed-value classifier. Fine-tuned from intfloat/multilingual-e5-small with a classification head to score Bluesky posts by feed worthiness.

Usage

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "Circularmachines/atproto_classifier"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

texts = ["passage: some post text here"]
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=512)

with torch.no_grad():
    probs = F.softmax(model(**inputs).logits, dim=-1)

score = probs[0][1].item()  # P(feed-worthy)
label = int(score > 0.5)

Training

  • Base model: intfloat/multilingual-e5-small
  • Architecture: BertForSequenceClassification (2 classes: not feed-worthy / feed-worthy)
  • Input prefix: passage: {text} (matches e5 training convention)
  • Training data: LLM-inferred labels via a DSPy-optimized Qwen classifier
  • Validation: Human-labeled Bluesky posts (held out, never used in training)
  • Labels: 0 = not feed-worthy, 1 = feed-worthy
Downloads last month
31
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Circularmachines/atproto_classifier

Finetuned
(148)
this model