🦅 Earlybird: Fast & Accurate AI Text Detection

Earlybird is a lightweight, high-speed AI text detection model designed to classify text as either Human-Written or AI-Generated.

Built on the efficient DistilRoBERTa architecture, it was fine-tuned on the W.O.R.M. (Wait, Original or Machine) dataset.

⚡ Model Stats

Architecture: DistilRoBERTa (82M parameters)
Primary Task: Binary Classification (Human vs. AI)
Context Window: 512 Tokens
Inference Speed: <50ms (CPU) / <5ms (GPU)

🚀 Overview

Earlybird is designed for rapid, real-time detection. Unlike generative Large Language Models (LLMs) that are slow and resource-heavy, Earlybird uses a distilled encoder architecture. This allows it to process text in milliseconds, making it ideal for high-volume applications like content moderation, academic integrity checks, and spam filtering.

The model analyzes stylistic patterns, perplexity, and token transitions to determine if a text was written by a human or generated by models like GPT-4, Claude, Llama, or Mistral.

📚 Training Data

Earlybird was trained on Mega-WORM, a unified dataset curated from four major open-source collections. The training data was rigorously filtered to ensure high-quality prose, focusing on texts with sufficient context (essays, blog posts, articles).

📊 Performance Benchmarks

The model excels at identifying AI-generated content in Medium and Long-form text (over 100 words). However, users should be aware of limitations regarding very short texts.

Detailed Length Breakdown

Text Category	Word Count	Accuracy	Performance
Short Text	<100 words	76.31%	⚠️ Weak
Medium Text	100 - 300 words	96.48%	✅ Excellent
Long Text	300+ words	95.01%	✅ Excellent

Overall Metrics

Metric	Score
Overall Accuracy	89.43%

⚠️ Important Limitations

Short Text Instability: As shown in the benchmarks, the model's accuracy drops significantly (to ~76%) on texts under 100 words (e.g., short tweets, single sentences). It is not recommended for use on short social media comments without human review.
Context Requirement: The model relies on analyzing sentence structure and paragraph flow. Without enough words, it lacks the context needed to make a high-confidence prediction.
False Positives: Highly formal, academic human writing can occasionally be flagged as AI due to its rigid structure.

Downloads last month: 4

Safetensors

Model size

82.1M params

Tensor type

F32

Model tree for noumenon-labs/Earlybird-fast

Base model

distilbert/distilroberta-base

Finetuned

(757)

this model

Finetunes

1 model

Quantizations

2 models

Evaluation results

accuracy on WORM (Wait, Original or Machine)
self-reported

98.200
f1 on WORM (Wait, Original or Machine)
self-reported

0.982