Update README.md

d631eac verified 14 days ago

3.86 kB

	---
	license: apache-2.0
	language:
	- multilingual
	- en
	- de
	- fr
	- es
	- pt
	- nl
	base_model: distilbert-base-multilingual-cased
	tags:
	- token-classification
	- semantic-parsing
	- hypergraph
	- nlp
	pipeline_tag: token-classification
	library_name: transformers
	---

	# Atom Classifier

	A multilingual token classifier for semantic hypergraph parsing. It classifies each token in a sentence into one of 39 semantic atom types/subtypes, serving as the first stage (alpha) of the [Alpha-Beta semantic hypergraph parser](https://github.com/hyperquest-hq/hyperbase-parser-ab).

	## Model Details

	- Architecture: DistilBertForTokenClassification
	- Base model: distilbert-base-multilingual-cased
	- Labels: 39 semantic atom types
	- Max sequence length: 512

	## Label Taxonomy

	Atoms are typed according to the [Semantic Hyperedge (SH) notation system](https://hyperquest.ai/hyperbase/manual/notation/). The 7 main types and their subtypes:

	### Concepts (C)
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `C` \| Generic concept \|
	\| `Cc` \| Common noun \|
	\| `Cp` \| Proper noun \|
	\| `Ca` \| Adjective (as concept) \|
	\| `Ci` \| Pronoun \|
	\| `Cd` \| Determiner (as concept) \|
	\| `Cm` \| Nominal modifier \|
	\| `Cw` \| Interrogative word \|
	\| `C#` \| Number \|

	### Predicates (P)
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `P` \| Generic predicate \|
	\| `Pd` \| Declarative predicate \|
	\| `P!` \| Imperative predicate \|

	### Modifiers (M)
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `M` \| Generic modifier \|
	\| `Ma` \| Adjective modifier \|
	\| `Mc` \| Conceptual modifier \|
	\| `Md` \| Determiner modifier \|
	\| `Me` \| Adverbial modifier \|
	\| `Mi` \| Infinitive particle \|
	\| `Mj` \| Conjunctional modifier \|
	\| `Ml` \| Particle \|
	\| `Mm` \| Modal (auxiliary verb) \|
	\| `Mn` \| Negation \|
	\| `Mp` \| Possessive modifier \|
	\| `Ms` \| Superlative modifier \|
	\| `Mt` \| Prepositional modifier \|
	\| `Mv` \| Verbal modifier \|
	\| `Mw` \| Specifier \|
	\| `M#` \| Number modifier \|
	\| `M=` \| Comparative modifier \|
	\| `M^` \| Degree modifier \|

	### Builders (B)
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `B` \| Generic builder \|
	\| `Bp` \| Possessive builder \|
	\| `Br` \| Relational builder (preposition) \|

	### Triggers (T)
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `T` \| Generic trigger \|
	\| `Tt` \| Temporal trigger \|
	\| `Tv` \| Verbal trigger \|

	### Conjunctions (J)
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `J` \| Generic conjunction \|
	\| `Jr` \| Relational conjunction \|

	### Special
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `X` \| Excluded token (punctuation, etc.) \|

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	import torch

	tokenizer = AutoTokenizer.from_pretrained("hyperquest/atom-classifier")
	model = AutoModelForTokenClassification.from_pretrained("hyperquest/atom-classifier")

	sentence = "Berlin is the capital of Germany."
	encoded = tokenizer(sentence, return_tensors="pt", return_offsets_mapping=True)
	offset_mapping = encoded.pop("offset_mapping")

	with torch.no_grad():
	outputs = model(**encoded)

	predictions = outputs.logits.argmax(-1)[0].tolist()
	word_ids = encoded.word_ids(0)

	for idx, word_id in enumerate(word_ids):
	if word_id is not None:
	start, end = offset_mapping[0][idx].tolist()
	label = model.config.id2label[predictions[idx]]
	print(f"{sentence[start:end]:15s} -> {label}")
	```

	## Intended Use

	This model is designed to be used as the first stage of the Alpha-Beta semantic hypergraph parser (`hyperbase-parser-ab`). It assigns atom types to tokens, which are then combined into nested hypergraph structures by rule-based grammar in the beta stage.

	## Part of

	- [hyperbase](https://github.com/hyperquest-hq/hyperbase) -- Semantic Hypergraph toolkit
	- [hyperbase-parser-ab](https://github.com/hyperquest-hq/hyperbase-parser-ab) -- Alpha-Beta parser

	---
	license: apache-2.0
	language:
	- multilingual
	- en
	- de
	- fr
	- es
	- pt
	- nl
	base_model: distilbert-base-multilingual-cased
	tags:
	- token-classification
	- semantic-parsing
	- hypergraph
	- nlp
	pipeline_tag: token-classification
	library_name: transformers
	---

	# Atom Classifier

	A multilingual token classifier for semantic hypergraph parsing. It classifies each token in a sentence into one of 39 semantic atom types/subtypes, serving as the first stage (alpha) of the [Alpha-Beta semantic hypergraph parser](https://github.com/hyperquest-hq/hyperbase-parser-ab).

	## Model Details

	- Architecture: DistilBertForTokenClassification
	- Base model: distilbert-base-multilingual-cased
	- Labels: 39 semantic atom types
	- Max sequence length: 512

	## Label Taxonomy

	Atoms are typed according to the [Semantic Hyperedge (SH) notation system](https://hyperquest.ai/hyperbase/manual/notation/). The 7 main types and their subtypes:

	### Concepts (C)
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `C` \| Generic concept \|
	\| `Cc` \| Common noun \|
	\| `Cp` \| Proper noun \|
	\| `Ca` \| Adjective (as concept) \|
	\| `Ci` \| Pronoun \|
	\| `Cd` \| Determiner (as concept) \|
	\| `Cm` \| Nominal modifier \|
	\| `Cw` \| Interrogative word \|
	\| `C#` \| Number \|

	### Predicates (P)
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `P` \| Generic predicate \|
	\| `Pd` \| Declarative predicate \|
	\| `P!` \| Imperative predicate \|

	### Modifiers (M)
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `M` \| Generic modifier \|
	\| `Ma` \| Adjective modifier \|
	\| `Mc` \| Conceptual modifier \|
	\| `Md` \| Determiner modifier \|
	\| `Me` \| Adverbial modifier \|
	\| `Mi` \| Infinitive particle \|
	\| `Mj` \| Conjunctional modifier \|
	\| `Ml` \| Particle \|
	\| `Mm` \| Modal (auxiliary verb) \|
	\| `Mn` \| Negation \|
	\| `Mp` \| Possessive modifier \|
	\| `Ms` \| Superlative modifier \|
	\| `Mt` \| Prepositional modifier \|
	\| `Mv` \| Verbal modifier \|
	\| `Mw` \| Specifier \|
	\| `M#` \| Number modifier \|
	\| `M=` \| Comparative modifier \|
	\| `M^` \| Degree modifier \|

	### Builders (B)
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `B` \| Generic builder \|
	\| `Bp` \| Possessive builder \|
	\| `Br` \| Relational builder (preposition) \|

	### Triggers (T)
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `T` \| Generic trigger \|
	\| `Tt` \| Temporal trigger \|
	\| `Tv` \| Verbal trigger \|

	### Conjunctions (J)
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `J` \| Generic conjunction \|
	\| `Jr` \| Relational conjunction \|

	### Special
	\| Label \| Description \|
	\|-------\|-------------\|
	\| `X` \| Excluded token (punctuation, etc.) \|

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	import torch

	tokenizer = AutoTokenizer.from_pretrained("hyperquest/atom-classifier")
	model = AutoModelForTokenClassification.from_pretrained("hyperquest/atom-classifier")

	sentence = "Berlin is the capital of Germany."
	encoded = tokenizer(sentence, return_tensors="pt", return_offsets_mapping=True)
	offset_mapping = encoded.pop("offset_mapping")

	with torch.no_grad():
	outputs = model(**encoded)

	predictions = outputs.logits.argmax(-1)[0].tolist()
	word_ids = encoded.word_ids(0)

	for idx, word_id in enumerate(word_ids):
	if word_id is not None:
	start, end = offset_mapping[0][idx].tolist()
	label = model.config.id2label[predictions[idx]]
	print(f"{sentence[start:end]:15s} -> {label}")
	```

	## Intended Use

	This model is designed to be used as the first stage of the Alpha-Beta semantic hypergraph parser (`hyperbase-parser-ab`). It assigns atom types to tokens, which are then combined into nested hypergraph structures by rule-based grammar in the beta stage.

	## Part of

	- [hyperbase](https://github.com/hyperquest-hq/hyperbase) -- Semantic Hypergraph toolkit
	- [hyperbase-parser-ab](https://github.com/hyperquest-hq/hyperbase-parser-ab) -- Alpha-Beta parser