RAID Dual Neologism Tokens
This repo contains two learned token embeddings trained on matched RAID GPT-4 vs human text pairs.
Tokens:
<neo_detect_ai_900001><neo_detect_human_900002>
Artifacts:
concept_900001_token.pt: AI-style token checkpointconcept_900002_token.pt: human-style token checkpointraid_eval_summary.json: held-out RAID dual-token evaluationbeemo_eval_summary.json: cross-dataset transfer evaluation on Beemo model outputs
The checkpoints are small PyTorch payloads with keys:
concept_idnew_tokennew_token_idembedding
Loading
import torch
payload = torch.load('concept_900001_token.pt', map_location='cpu')
token = payload['new_token']
embedding = payload['embedding']
These are not full finetuned model weights. They are learned embedding rows intended to be inserted into the base tokenizer/model vocabulary used during training.
Repo: danielfein/raid-dual-neologism-tokens