Pnyx Erscheinung - Authenticity Detection Model (v0.7)

Named after Hannah Arendt's Erscheinungsraum (space of appearance). Detects whether genuine human presence exists behind a text - whether there is a "who" to listen to.

Model

  • Base: microsoft/deberta-v3-small (141M params)
  • Format: ONNX, FP16, pruned vocab (70K tokens from 128K) (126 MB)
  • Inference: ONNX Runtime Web (WASM) for in-browser use

Architecture

DeBERTa CLS (768-dim) + features (8-dim)
  -> LayerNorm(776)
  -> Linear(776, 256) -> GELU -> Dropout(0.3)
  -> Linear(256, 128) -> GELU -> Dropout(0.2)
  -> Linear(128, 2)

Inputs

Name Shape Description
input_ids [1, 128] Tokenized text (max 128 tokens)
attention_mask [1, 128] Token mask
features [1, 8] Hand-crafted features (TTR, hapax rate, sentence variance, etc.)

Token remapping

This model uses a pruned vocabulary. After tokenization with the standard DeBERTa tokenizer, remap token IDs using token_remap.json:

const remap = await fetch('token_remap.json').then(r => r.json());
const remapped = ids.map(id => remap[id] ?? 0);

Feature order (index 0-7)

Index Feature Range
0 Type-Token Ratio 0-1
1 Hapax rate 0-1
2 Sentence length variance 0+
3 Average sentence length 0+
4 Bigram uniqueness 0-1
5 Stop word density 0-1
6 Contraction presence 0/1
7 Lowercase ratio 0-1

Output

Softmax probabilities: [human_prob, ai_prob]. Score >= 0.5 indicates AI-generated text.

Dual-tier architecture

In practice, this model runs alongside an 85-signal heuristic tier. If heuristic confidence is high enough (score >= 4.0), ML inference is skipped entirely, saving ~500ms.

Part of Pnyx

This model powers the SEE layer (authenticity detection) of Pnyx, a listening infrastructure for public discourse built for the Agora Hackathon x TUM.ai E-Lab (April 2026).

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support