attentiontypes-commonsense
This model is a custom transformers export of the AttentionTypes model from a course project on commonsense reasoning. The task is to choose the correct answer (A, B, or C) for a false sentence by comparing candidate repairs and selecting the most sensible one.
The underlying model is a grouped BBPE transformer with DeBERTa-inspired disentangled attention. It was trained as a multiple-choice scoring model for a Kaggle-style commonsense challenge and exported here as a custom Hugging Face model for inference and reproducibility.
Model Description
- Architecture: encoder-only transformer with disentangled attention
- Tokenization: grouped BBPE with
<cls> false sentence <sep> option <eos> - Objective: score each candidate option and select the highest-scoring repair
- Output: one logit per encoded
(false sentence, option)pair - Status: experimental research artifact from an in-progress student project
This is not a pretrained general-purpose language model. It is a task-specific classifier checkpoint.
Task Summary
One example question looks like this:
- False sentence:
The sun rises in the west. - A:
The sun rises in the east. - B:
The sun sets in the west. - C:
The sun shines at night.
The correct answer is A. The model learns to score which candidate best repairs the false statement.
Training Data
The model was trained for a course competition setup built around false sentences paired with three candidate corrections. Each training example is represented as three grouped inputs, one per answer option, and the final prediction is the argmax over the three option scores.
The data used for this project lives in the original repository as local CSV files rather than as a standalone Hugging Face dataset:
data/train_data.csvdata/train_answers.csvdata/test_data.csvdata/sample.csv
The training table contains a false sentence plus three candidate answer options. The labels indicate which option correctly repairs the false statement. The test split follows the same structure without released labels.
Because this dataset comes from a course competition setup, it should be treated as task-specific experimental data rather than a broadly curated benchmark.
The same CSV files are included in this Hugging Face repository under dataset/ for convenience.
Performance
In the project repository, this AttentionTypes variant was the strongest final experiment.
- Best validation accuracy:
0.7456 - Kaggle public score:
0.7350
These numbers come from the course-project evaluation setting and should not be treated as a broad benchmark outside that task.
Architecture
This model uses a shared encoder and linear scoring head for all answer options. For each question, the three candidate options are encoded separately, scored independently, and ranked.
The attention mechanism is a simplified DeBERTa-style disentangled attention with three terms:
- content-to-content
- content-to-position
- position-to-content
Relative position embeddings are used inside attention, and the pooled sequence representation is produced with mean pooling over non-padding tokens.
Training Setup
- Vocabulary size:
12000 - Model dim:
256 - Heads:
8 - Layers:
4 - FF multiplier:
4 - Dropout:
0.3 - Pooling:
mean - Max relative positions:
512 - Grouped input max length:
128 - BBPE min frequency:
2
Intended Use
This model is intended for:
- reproducing the course-project experiment
- inspecting a custom disentangled-attention classifier
- running inference on data shaped like the original false-sentence repair task
This model is not intended for:
- general-purpose text classification
- masked language modeling
- use as a drop-in replacement for standard BERT checkpoints
How To Use
For one question, encode the three (FalseSent, OptionX) pairs independently and choose the highest-scoring option. The exported Hugging Face model returns one logit per encoded pair, so the caller is responsible for ranking the three candidate options.
Example
from transformers import AutoTokenizer, AutoModelForSequenceClassification
repo_id = "owenarink/attentiontypes-commonsense"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForSequenceClassification.from_pretrained(repo_id, trust_remote_code=True)
false_sentence = "The sun rises in the west."
options = [
"The sun rises in the east.",
"The sun sets in the west.",
"The sun shines at night.",
]
encoded = tokenizer(
[false_sentence] * len(options),
options,
padding=True,
truncation=True,
return_tensors="pt",
)
scores = model(**encoded).logits.squeeze(-1)
best_idx = int(scores.argmax().item())
print(best_idx)
Limitations
- Custom architecture requiring
trust_remote_code=True - Partial implementation from an in-progress project
- Not pretrained on a large general corpus
- Evaluated only on the course competition setup
- No broad safety, bias, or robustness evaluation
This upload should be treated as an experimental checkpoint for a specific academic task rather than a polished foundation model.
- Downloads last month
- 39