title: LREC 2026 LLM-as-Annotator
emoji: ✒️
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: Annotate historical and low-resource languages with LLMs
LREC 2026 — LLM-as-Annotator Workbench
A corpus-centered annotation app built around the LLM-as-annotator pipeline described in the LREC 2026 tutorial and the companion LoResLM 2026 paper. The text is the focal point; everything else (task schema, models, prompt, ICL pool, exports) lives in popups behind toolbar pills.
What it does
- Loads a corpus (paste, file, or sandbox example from the four historical languages of the paper).
- Annotates token by token with one or more LLMs (single inference or Mixture-of-Experts).
- Highlights MoE disagreements so the reviewer focuses on contested tokens first.
- Lets you correct any token in a focused popup with per-model votes, keyboard navigation, bulk operations, and a "re-ask one model" action.
- Bootstrap loop: corrected sentences feed back into the few-shot pool (filtered by
(language, schema_hash)to avoid task contamination). - Exports as TSV (PIE-baseline round-trip), JSON (schema-conformant), CoNLL-U (UD standard), or JSONL (fine-tune format).
Companion paper
Vidal-Gorène, C., Kindt, B., & Cafiero, F. (2026). Under-resourced studies of under-resourced languages: lemmatization and POS-tagging with LLM annotators for historical Armenian, Georgian, Greek and Syriac. LoResLM 2026. https://aclanthology.org/2026.loreslm-1.28/
Tutorial repo: floriancafiero/lrec2026-llm-as-annotator-tutorial
Stack
- Backend: FastAPI + httpx (async OpenRouter client).
- Frontend: single static HTML page + Alpine.js (15 KB, CDN) + Tailwind CSS (CDN). No build step.
Run locally
cd app
pip install -r requirements.txt
python app.py # or: uvicorn app:app --reload --port 7860
# open http://127.0.0.1:7860
The app expects the two sibling repos at:
LREC-tutorial/
├── code/
│ ├── EACL2026-historical-languages/ # sandbox corpora + tagsets
│ └── lrec2026-llm-as-annotator-tutorial/ # JSON schema + system prompts
└── app/ # this directory
Workflow
- Sidebar → quick start — click an example corpus (Ancient Greek, Old Armenian, Syriac). The toolbar updates the task, language, and models.
- Top bar → 🔑 OpenRouter — paste your API key (kept in this browser session only).
- Top bar → ▶ Annotate all — runs every model in parallel (Mixture-of-Experts if 2+ models). Tokens are colored by status: indigo = consensus, amber ⚠ = disagreement.
- Click any token → popup with editable fields, per-model votes, keyboard navigation, "adopt from " and "re-ask one model" shortcuts.
- 📥 to ICL on a sentence — pushes the corrected annotation into the few-shot pool. The next run re-injects it.
- Top bar → export — TSV / JSON / CoNLL-U / JSONL.
Keyboard shortcuts
| Key | Action |
|---|---|
j / k |
next / previous token |
e or ↵ |
edit focused token |
1–9 |
(in editor) assign the i-th visible tag |
x |
toggle selection of focused token |
r |
re-annotate the focused sentence |
↵ |
save edit & advance to next disagreement |
Esc |
close popup / clear selection |
shift+click |
multi-select tokens (then "Apply tag…") |
right-click |
per-token context menu |
Deploy on HuggingFace Spaces
This app/ directory is self-contained: the tagsets, schemas, system
prompts, cheatsheet and a slice of the four sandbox corpora are vendored under
data/ (≈ 900 KB). You do not need to push the parent repo or use git
submodules.
One-shot deploy
cd app
# Create a new Space (Docker SDK) at https://huggingface.co/new-space
# Then push this directory as the Space's root:
git init && git add . && git commit -m "init"
git remote add space https://huggingface.co/spaces/<your-user>/<space-name>
git push --force space main
The Space builds from Dockerfile, boots uvicorn on port 7860, and serves
the SPA at /.
⚠ Single-user demo
SESSION is module-global. The Space serves one user at a time — if two
people open it simultaneously, they share the same corpus, the same selected
models, and (briefly) the same API key. For the LREC tutorial we recommend:
🦆 Each attendee clicks the "⋮ → Duplicate this Space" button in the top-right of the Space page. They get a free private clone, isolated state, their own API key in their own browser.
This is the simplest way to fan out the tutorial. Document this prominently on the Space's README.
Optional: ship a default OpenRouter key
If you want attendees to start without entering a key (e.g., a shared demo
key with a rate limit), set a Space Secret named OPENROUTER_API_KEY.
The backend reads it at startup; users can still override it from the UI.
API keys entered through the UI are never persisted — they live only in
the in-memory SESSION dict and are forgotten on restart.
File map
| File | Role |
|---|---|
| app.py | FastAPI app: state + REST endpoints |
| static/index.html | SPA layout: toolbar, sidebar, corpus panel, modals |
| static/app.js | Alpine.js state + handlers + keyboard shortcuts |
| static/styles.css | Token chips, modals, polish |
| provider.py | OpenRouter async client (JSON-Schema response_format + retry) |
| moe.py | Pure aggregate() — vote / LCS / min / priority |
| schemas.py | AnnotationSchema + 8 presets |
| prompts.py | Templates from tutorial repo + ICLPool |
| io_utils.py | Tokenizer + TSV / JSON / CoNLL-U / JSONL I/O |
| tutorial.py | 3 guided examples prefilling the corpus |
| paths.py | Resolves sibling repos (read-only) |
License
MIT for this app code. Sandbox data and prompt templates remain under their
upstream licenses (see the two code/ repositories).