attentiontypes-commonsense

This model is a custom transformers export of the AttentionTypes model from a course project on commonsense reasoning. The task is to choose the correct answer (A, B, or C) for a false sentence by comparing candidate repairs and selecting the most sensible one.

The underlying model is a grouped BBPE transformer with DeBERTa-inspired disentangled attention. It was trained as a multiple-choice scoring model for a Kaggle-style commonsense challenge and exported here as a custom Hugging Face model for inference and reproducibility.

Model Description

Architecture: encoder-only transformer with disentangled attention
Tokenization: grouped BBPE with <cls> false sentence <sep> option <eos>
Objective: score each candidate option and select the highest-scoring repair
Output: one logit per encoded (false sentence, option) pair
Status: experimental research artifact from an in-progress student project

This is not a pretrained general-purpose language model. It is a task-specific classifier checkpoint.

Task Summary

One example question looks like this:

False sentence: The sun rises in the west.
A: The sun rises in the east.
B: The sun sets in the west.
C: The sun shines at night.

The correct answer is A. The model learns to score which candidate best repairs the false statement.

Training Data

The model was trained for a course competition setup built around false sentences paired with three candidate corrections. Each training example is represented as three grouped inputs, one per answer option, and the final prediction is the argmax over the three option scores.

The data used for this project lives in the original repository as local CSV files rather than as a standalone Hugging Face dataset:

data/train_data.csv
data/train_answers.csv
data/test_data.csv
data/sample.csv

The training table contains a false sentence plus three candidate answer options. The labels indicate which option correctly repairs the false statement. The test split follows the same structure without released labels.

Because this dataset comes from a course competition setup, it should be treated as task-specific experimental data rather than a broadly curated benchmark.

The same CSV files are included in this Hugging Face repository under dataset/ for convenience.

Performance

In the project repository, this AttentionTypes variant was the strongest final experiment.

Best validation accuracy: 0.7456
Kaggle public score: 0.7350

These numbers come from the course-project evaluation setting and should not be treated as a broad benchmark outside that task.

Architecture

This model uses a shared encoder and linear scoring head for all answer options. For each question, the three candidate options are encoded separately, scored independently, and ranked.

The attention mechanism is a simplified DeBERTa-style disentangled attention with three terms:

content-to-content
content-to-position
position-to-content

Relative position embeddings are used inside attention, and the pooled sequence representation is produced with mean pooling over non-padding tokens.

Training Setup

Vocabulary size: 12000
Model dim: 256
Heads: 8
Layers: 4
FF multiplier: 4
Dropout: 0.3
Pooling: mean
Max relative positions: 512
Grouped input max length: 128
BBPE min frequency: 2

Intended Use

This model is intended for:

reproducing the course-project experiment
inspecting a custom disentangled-attention classifier
running inference on data shaped like the original false-sentence repair task

This model is not intended for:

general-purpose text classification
masked language modeling
use as a drop-in replacement for standard BERT checkpoints

How To Use

For one question, encode the three (FalseSent, OptionX) pairs independently and choose the highest-scoring option. The exported Hugging Face model returns one logit per encoded pair, so the caller is responsible for ranking the three candidate options.

Example

from transformers import AutoTokenizer, AutoModelForSequenceClassification

repo_id = "owenarink/attentiontypes-commonsense"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForSequenceClassification.from_pretrained(repo_id, trust_remote_code=True)

false_sentence = "The sun rises in the west."
options = [
    "The sun rises in the east.",
    "The sun sets in the west.",
    "The sun shines at night.",
]

encoded = tokenizer(
    [false_sentence] * len(options),
    options,
    padding=True,
    truncation=True,
    return_tensors="pt",
)

scores = model(**encoded).logits.squeeze(-1)
best_idx = int(scores.argmax().item())
print(best_idx)

Limitations

Custom architecture requiring trust_remote_code=True
Partial implementation from an in-progress project
Not pretrained on a large general corpus
Evaluated only on the course competition setup
No broad safety, bias, or robustness evaluation

This upload should be treated as an experimental checkpoint for a specific academic task rather than a polished foundation model.

Downloads last month: 39

Safetensors

Model size

7.81M params

Tensor type

F32

owenarink
/

attentiontypes-commonsense