| --- |
| license: apache-2.0 |
| language: |
| - multilingual |
| - en |
| - de |
| - fr |
| - es |
| - pt |
| - nl |
| base_model: distilbert-base-multilingual-cased |
| tags: |
| - token-classification |
| - semantic-parsing |
| - hypergraph |
| - nlp |
| pipeline_tag: token-classification |
| library_name: transformers |
| --- |
| |
| # Atom Classifier |
|
|
| A multilingual token classifier for **semantic hypergraph parsing**. It classifies each token in a sentence into one of 39 semantic atom types/subtypes, serving as the first stage (alpha) of the [Alpha-Beta semantic hypergraph parser](https://github.com/hyperquest-hq/hyperbase-parser-ab). |
|
|
| ## Model Details |
|
|
| - **Architecture:** DistilBertForTokenClassification |
| - **Base model:** distilbert-base-multilingual-cased |
| - **Labels:** 39 semantic atom types |
| - **Max sequence length:** 512 |
|
|
| ## Label Taxonomy |
|
|
| Atoms are typed according to the [Semantic Hyperedge (SH) notation system](https://hyperquest.ai/hyperbase/manual/notation/). The 7 main types and their subtypes: |
|
|
| ### Concepts (C) |
| | Label | Description | |
| |-------|-------------| |
| | `C` | Generic concept | |
| | `Cc` | Common noun | |
| | `Cp` | Proper noun | |
| | `Ca` | Adjective (as concept) | |
| | `Ci` | Pronoun | |
| | `Cd` | Determiner (as concept) | |
| | `Cm` | Nominal modifier | |
| | `Cw` | Interrogative word | |
| | `C#` | Number | |
|
|
| ### Predicates (P) |
| | Label | Description | |
| |-------|-------------| |
| | `P` | Generic predicate | |
| | `Pd` | Declarative predicate | |
| | `P!` | Imperative predicate | |
|
|
| ### Modifiers (M) |
| | Label | Description | |
| |-------|-------------| |
| | `M` | Generic modifier | |
| | `Ma` | Adjective modifier | |
| | `Mc` | Conceptual modifier | |
| | `Md` | Determiner modifier | |
| | `Me` | Adverbial modifier | |
| | `Mi` | Infinitive particle | |
| | `Mj` | Conjunctional modifier | |
| | `Ml` | Particle | |
| | `Mm` | Modal (auxiliary verb) | |
| | `Mn` | Negation | |
| | `Mp` | Possessive modifier | |
| | `Ms` | Superlative modifier | |
| | `Mt` | Prepositional modifier | |
| | `Mv` | Verbal modifier | |
| | `Mw` | Specifier | |
| | `M#` | Number modifier | |
| | `M=` | Comparative modifier | |
| | `M^` | Degree modifier | |
|
|
| ### Builders (B) |
| | Label | Description | |
| |-------|-------------| |
| | `B` | Generic builder | |
| | `Bp` | Possessive builder | |
| | `Br` | Relational builder (preposition) | |
|
|
| ### Triggers (T) |
| | Label | Description | |
| |-------|-------------| |
| | `T` | Generic trigger | |
| | `Tt` | Temporal trigger | |
| | `Tv` | Verbal trigger | |
|
|
| ### Conjunctions (J) |
| | Label | Description | |
| |-------|-------------| |
| | `J` | Generic conjunction | |
| | `Jr` | Relational conjunction | |
|
|
| ### Special |
| | Label | Description | |
| |-------|-------------| |
| | `X` | Excluded token (punctuation, etc.) | |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForTokenClassification |
| import torch |
| |
| tokenizer = AutoTokenizer.from_pretrained("hyperquest/atom-classifier") |
| model = AutoModelForTokenClassification.from_pretrained("hyperquest/atom-classifier") |
| |
| sentence = "Berlin is the capital of Germany." |
| encoded = tokenizer(sentence, return_tensors="pt", return_offsets_mapping=True) |
| offset_mapping = encoded.pop("offset_mapping") |
| |
| with torch.no_grad(): |
| outputs = model(**encoded) |
| |
| predictions = outputs.logits.argmax(-1)[0].tolist() |
| word_ids = encoded.word_ids(0) |
| |
| for idx, word_id in enumerate(word_ids): |
| if word_id is not None: |
| start, end = offset_mapping[0][idx].tolist() |
| label = model.config.id2label[predictions[idx]] |
| print(f"{sentence[start:end]:15s} -> {label}") |
| ``` |
|
|
| ## Intended Use |
|
|
| This model is designed to be used as the first stage of the Alpha-Beta semantic hypergraph parser (`hyperbase-parser-ab`). It assigns atom types to tokens, which are then combined into nested hypergraph structures by rule-based grammar in the beta stage. |
|
|
| ## Part of |
|
|
| - [hyperbase](https://github.com/hyperquest-hq/hyperbase) -- Semantic Hypergraph toolkit |
| - [hyperbase-parser-ab](https://github.com/hyperquest-hq/hyperbase-parser-ab) -- Alpha-Beta parser |
|
|