DistilGPT-2 Fine-Tuned on arXiv CS/AI Abstracts

A DistilGPT-2 (82M parameters) model fine-tuned on computer science and artificial intelligence paper abstracts from arXiv. The model generates text in the style of academic CS/AI research abstracts.

Model Details

Property	Value
Base Model	distilgpt2
Parameters	81.9M total, 14.2M trainable (17.3%)
Fine-tuning Strategy	Partial freeze — last 2 transformer blocks + LM head
Training Data	ccdv/arxiv-classification (CS/AI subset)
Training Samples	2,000
Max Sequence Length	128 tokens
License	MIT

Training Details

Dataset

The model was fine-tuned on abstracts from the ccdv/arxiv-classification dataset, filtered for computer science and AI categories:

cs.CV (Computer Vision)
cs.AI (Artificial Intelligence)
cs.SY (Systems and Control)
cs.CE (Computational Engineering)
cs.PL (Programming Languages)
cs.IT (Information Theory)
cs.DS (Data Structures and Algorithms)
cs.NE (Neural and Evolutionary Computing)

Hyperparameters

Parameter	Value
Learning rate	3e-4
Batch size	8
Max steps	100
Warmup steps	10
Scheduler	Cosine
Weight decay	0.01
Frozen layers	Embedding + blocks 0-3
Trainable layers	Blocks 4-5 + LayerNorm + LM Head

Metrics

Metric	Value
Training loss	3.82
Eval loss	3.48
Perplexity	32.59
Training time	~7 minutes (CPU)

Usage

from transformers import pipeline

generator = pipeline("text-generation", model="Arcoson/distilgpt2-arxiv-csai")

output = generator(
    "arXiv Abstract: We propose a novel approach to",
    max_new_tokens=100,
    do_sample=True,
    temperature=0.8,
    top_p=0.9,
)
print(output[0]["generated_text"])

Or load directly:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Arcoson/distilgpt2-arxiv-csai")
model = AutoModelForCausalLM.from_pretrained("Arcoson/distilgpt2-arxiv-csai")

inputs = tokenizer("arXiv Abstract: In this paper, we study the problem of", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=80, do_sample=True, temperature=0.8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Sample Generations

Prompt: "arXiv Abstract: We propose a novel approach to"

Fine-tuned output:

We propose a novel approach to the problem of time-time in the computation model, i.e., and prediction models for (i) continuous intervals which are nonlinear because they cannot be solved with finite random variables or large number functions...

Base DistilGPT-2 output:

We propose a novel approach to the question of whether we can solve this problem by using real-time neural networks in an MRI technique...

The fine-tuned model produces more domain-appropriate language with CS/AI terminology compared to the base model.

Limitations

Short context: Trained on 128-token sequences, so it works best for generating short abstracts or completions.
Limited training: Only 100 steps on 2,000 samples — this is a proof-of-concept fine-tune. More training data and steps would improve quality.
Domain narrow: Outputs are biased toward CS/AI academic language. Not suitable for general-purpose text generation.
No factual accuracy: The model generates plausible-sounding but not necessarily factually correct research text.

Training Infrastructure

Hardware: CPU (2 vCPUs, 8GB RAM)
Framework: HuggingFace Transformers 5.5.0, PyTorch 2.11
Strategy: Partial layer freezing to enable CPU-feasible training

Citation

If you use this model, please cite the base model and dataset:

@article{sanh2019distilbert,
  title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
  author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
  journal={arXiv preprint arXiv:1910.01108},
  year={2019}
}

Downloads last month: 188

Safetensors

Model size

81.9M params

Tensor type

F32

Model tree for Arcoson/distilgpt2-arxiv-csai

Base model

distilbert/distilgpt2

Finetuned

(1440)

this model

Dataset used to train Arcoson/distilgpt2-arxiv-csai

Paper for Arcoson/distilgpt2-arxiv-csai

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 22

Evaluation results

Perplexity on arXiv CS/AI Abstracts
self-reported

32.590