--- language: - multilingual base_model: intfloat/multilingual-e5-small pipeline_tag: text-classification --- # feed-classifier A multilingual feed-value classifier. Fine-tuned from `intfloat/multilingual-e5-small` with a classification head to score Bluesky posts by feed worthiness. ## Usage ```python import torch import torch.nn.functional as F from transformers import AutoTokenizer, AutoModelForSequenceClassification model_id = "Circularmachines/atproto_classifier" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForSequenceClassification.from_pretrained(model_id) model.eval() texts = ["passage: some post text here"] inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=512) with torch.no_grad(): probs = F.softmax(model(**inputs).logits, dim=-1) score = probs[0][1].item() # P(feed-worthy) label = int(score > 0.5) ``` ## Training - **Base model**: `intfloat/multilingual-e5-small` - **Architecture**: `BertForSequenceClassification` (2 classes: not feed-worthy / feed-worthy) - **Input prefix**: `passage: {text}` (matches e5 training convention) - **Training data**: LLM-inferred labels via a DSPy-optimized Qwen classifier - **Validation**: Human-labeled Bluesky posts (held out, never used in training) - **Labels**: 0 = not feed-worthy, 1 = feed-worthy