DistilGPT-2 Fine-Tuned on arXiv CS/AI Abstracts

A DistilGPT-2 (82M parameters) model fine-tuned on computer science and artificial intelligence paper abstracts from arXiv. The model generates text in the style of academic CS/AI research abstracts.

Model Details

Property Value
Base Model distilgpt2
Parameters 81.9M total, 14.2M trainable (17.3%)
Fine-tuning Strategy Partial freeze — last 2 transformer blocks + LM head
Training Data ccdv/arxiv-classification (CS/AI subset)
Training Samples 2,000
Max Sequence Length 128 tokens
License MIT

Training Details

Dataset

The model was fine-tuned on abstracts from the ccdv/arxiv-classification dataset, filtered for computer science and AI categories:

  • cs.CV (Computer Vision)
  • cs.AI (Artificial Intelligence)
  • cs.SY (Systems and Control)
  • cs.CE (Computational Engineering)
  • cs.PL (Programming Languages)
  • cs.IT (Information Theory)
  • cs.DS (Data Structures and Algorithms)
  • cs.NE (Neural and Evolutionary Computing)

Hyperparameters

Parameter Value
Learning rate 3e-4
Batch size 8
Max steps 100
Warmup steps 10
Scheduler Cosine
Weight decay 0.01
Frozen layers Embedding + blocks 0-3
Trainable layers Blocks 4-5 + LayerNorm + LM Head

Metrics

Metric Value
Training loss 3.82
Eval loss 3.48
Perplexity 32.59
Training time ~7 minutes (CPU)

Usage

from transformers import pipeline

generator = pipeline("text-generation", model="Arcoson/distilgpt2-arxiv-csai")

output = generator(
    "arXiv Abstract: We propose a novel approach to",
    max_new_tokens=100,
    do_sample=True,
    temperature=0.8,
    top_p=0.9,
)
print(output[0]["generated_text"])

Or load directly:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Arcoson/distilgpt2-arxiv-csai")
model = AutoModelForCausalLM.from_pretrained("Arcoson/distilgpt2-arxiv-csai")

inputs = tokenizer("arXiv Abstract: In this paper, we study the problem of", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=80, do_sample=True, temperature=0.8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Sample Generations

Prompt: "arXiv Abstract: We propose a novel approach to"

Fine-tuned output:

We propose a novel approach to the problem of time-time in the computation model, i.e., and prediction models for (i) continuous intervals which are nonlinear because they cannot be solved with finite random variables or large number functions...

Base DistilGPT-2 output:

We propose a novel approach to the question of whether we can solve this problem by using real-time neural networks in an MRI technique...

The fine-tuned model produces more domain-appropriate language with CS/AI terminology compared to the base model.

Limitations

  • Short context: Trained on 128-token sequences, so it works best for generating short abstracts or completions.
  • Limited training: Only 100 steps on 2,000 samples — this is a proof-of-concept fine-tune. More training data and steps would improve quality.
  • Domain narrow: Outputs are biased toward CS/AI academic language. Not suitable for general-purpose text generation.
  • No factual accuracy: The model generates plausible-sounding but not necessarily factually correct research text.

Training Infrastructure

  • Hardware: CPU (2 vCPUs, 8GB RAM)
  • Framework: HuggingFace Transformers 5.5.0, PyTorch 2.11
  • Strategy: Partial layer freezing to enable CPU-feasible training

Citation

If you use this model, please cite the base model and dataset:

@article{sanh2019distilbert,
  title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
  author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
  journal={arXiv preprint arXiv:1910.01108},
  year={2019}
}
Downloads last month
188
Safetensors
Model size
81.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Arcoson/distilgpt2-arxiv-csai

Finetuned
(1440)
this model

Dataset used to train Arcoson/distilgpt2-arxiv-csai

Paper for Arcoson/distilgpt2-arxiv-csai

Evaluation results