File size: 3,859 Bytes
b7e2589 d631eac b7e2589 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | ---
license: apache-2.0
language:
- multilingual
- en
- de
- fr
- es
- pt
- nl
base_model: distilbert-base-multilingual-cased
tags:
- token-classification
- semantic-parsing
- hypergraph
- nlp
pipeline_tag: token-classification
library_name: transformers
---
# Atom Classifier
A multilingual token classifier for **semantic hypergraph parsing**. It classifies each token in a sentence into one of 39 semantic atom types/subtypes, serving as the first stage (alpha) of the [Alpha-Beta semantic hypergraph parser](https://github.com/hyperquest-hq/hyperbase-parser-ab).
## Model Details
- **Architecture:** DistilBertForTokenClassification
- **Base model:** distilbert-base-multilingual-cased
- **Labels:** 39 semantic atom types
- **Max sequence length:** 512
## Label Taxonomy
Atoms are typed according to the [Semantic Hyperedge (SH) notation system](https://hyperquest.ai/hyperbase/manual/notation/). The 7 main types and their subtypes:
### Concepts (C)
| Label | Description |
|-------|-------------|
| `C` | Generic concept |
| `Cc` | Common noun |
| `Cp` | Proper noun |
| `Ca` | Adjective (as concept) |
| `Ci` | Pronoun |
| `Cd` | Determiner (as concept) |
| `Cm` | Nominal modifier |
| `Cw` | Interrogative word |
| `C#` | Number |
### Predicates (P)
| Label | Description |
|-------|-------------|
| `P` | Generic predicate |
| `Pd` | Declarative predicate |
| `P!` | Imperative predicate |
### Modifiers (M)
| Label | Description |
|-------|-------------|
| `M` | Generic modifier |
| `Ma` | Adjective modifier |
| `Mc` | Conceptual modifier |
| `Md` | Determiner modifier |
| `Me` | Adverbial modifier |
| `Mi` | Infinitive particle |
| `Mj` | Conjunctional modifier |
| `Ml` | Particle |
| `Mm` | Modal (auxiliary verb) |
| `Mn` | Negation |
| `Mp` | Possessive modifier |
| `Ms` | Superlative modifier |
| `Mt` | Prepositional modifier |
| `Mv` | Verbal modifier |
| `Mw` | Specifier |
| `M#` | Number modifier |
| `M=` | Comparative modifier |
| `M^` | Degree modifier |
### Builders (B)
| Label | Description |
|-------|-------------|
| `B` | Generic builder |
| `Bp` | Possessive builder |
| `Br` | Relational builder (preposition) |
### Triggers (T)
| Label | Description |
|-------|-------------|
| `T` | Generic trigger |
| `Tt` | Temporal trigger |
| `Tv` | Verbal trigger |
### Conjunctions (J)
| Label | Description |
|-------|-------------|
| `J` | Generic conjunction |
| `Jr` | Relational conjunction |
### Special
| Label | Description |
|-------|-------------|
| `X` | Excluded token (punctuation, etc.) |
## Usage
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("hyperquest/atom-classifier")
model = AutoModelForTokenClassification.from_pretrained("hyperquest/atom-classifier")
sentence = "Berlin is the capital of Germany."
encoded = tokenizer(sentence, return_tensors="pt", return_offsets_mapping=True)
offset_mapping = encoded.pop("offset_mapping")
with torch.no_grad():
outputs = model(**encoded)
predictions = outputs.logits.argmax(-1)[0].tolist()
word_ids = encoded.word_ids(0)
for idx, word_id in enumerate(word_ids):
if word_id is not None:
start, end = offset_mapping[0][idx].tolist()
label = model.config.id2label[predictions[idx]]
print(f"{sentence[start:end]:15s} -> {label}")
```
## Intended Use
This model is designed to be used as the first stage of the Alpha-Beta semantic hypergraph parser (`hyperbase-parser-ab`). It assigns atom types to tokens, which are then combined into nested hypergraph structures by rule-based grammar in the beta stage.
## Part of
- [hyperbase](https://github.com/hyperquest-hq/hyperbase) -- Semantic Hypergraph toolkit
- [hyperbase-parser-ab](https://github.com/hyperquest-hq/hyperbase-parser-ab) -- Alpha-Beta parser
|