DistilGPT-2 Fine-Tuned on arXiv CS/AI Abstracts
A DistilGPT-2 (82M parameters) model fine-tuned on computer science and artificial intelligence paper abstracts from arXiv. The model generates text in the style of academic CS/AI research abstracts.
Model Details
| Property | Value |
|---|---|
| Base Model | distilgpt2 |
| Parameters | 81.9M total, 14.2M trainable (17.3%) |
| Fine-tuning Strategy | Partial freeze — last 2 transformer blocks + LM head |
| Training Data | ccdv/arxiv-classification (CS/AI subset) |
| Training Samples | 2,000 |
| Max Sequence Length | 128 tokens |
| License | MIT |
Training Details
Dataset
The model was fine-tuned on abstracts from the ccdv/arxiv-classification dataset, filtered for computer science and AI categories:
cs.CV(Computer Vision)cs.AI(Artificial Intelligence)cs.SY(Systems and Control)cs.CE(Computational Engineering)cs.PL(Programming Languages)cs.IT(Information Theory)cs.DS(Data Structures and Algorithms)cs.NE(Neural and Evolutionary Computing)
Hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 3e-4 |
| Batch size | 8 |
| Max steps | 100 |
| Warmup steps | 10 |
| Scheduler | Cosine |
| Weight decay | 0.01 |
| Frozen layers | Embedding + blocks 0-3 |
| Trainable layers | Blocks 4-5 + LayerNorm + LM Head |
Metrics
| Metric | Value |
|---|---|
| Training loss | 3.82 |
| Eval loss | 3.48 |
| Perplexity | 32.59 |
| Training time | ~7 minutes (CPU) |
Usage
from transformers import pipeline
generator = pipeline("text-generation", model="Arcoson/distilgpt2-arxiv-csai")
output = generator(
"arXiv Abstract: We propose a novel approach to",
max_new_tokens=100,
do_sample=True,
temperature=0.8,
top_p=0.9,
)
print(output[0]["generated_text"])
Or load directly:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Arcoson/distilgpt2-arxiv-csai")
model = AutoModelForCausalLM.from_pretrained("Arcoson/distilgpt2-arxiv-csai")
inputs = tokenizer("arXiv Abstract: In this paper, we study the problem of", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=80, do_sample=True, temperature=0.8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Sample Generations
Prompt: "arXiv Abstract: We propose a novel approach to"
Fine-tuned output:
We propose a novel approach to the problem of time-time in the computation model, i.e., and prediction models for (i) continuous intervals which are nonlinear because they cannot be solved with finite random variables or large number functions...
Base DistilGPT-2 output:
We propose a novel approach to the question of whether we can solve this problem by using real-time neural networks in an MRI technique...
The fine-tuned model produces more domain-appropriate language with CS/AI terminology compared to the base model.
Limitations
- Short context: Trained on 128-token sequences, so it works best for generating short abstracts or completions.
- Limited training: Only 100 steps on 2,000 samples — this is a proof-of-concept fine-tune. More training data and steps would improve quality.
- Domain narrow: Outputs are biased toward CS/AI academic language. Not suitable for general-purpose text generation.
- No factual accuracy: The model generates plausible-sounding but not necessarily factually correct research text.
Training Infrastructure
- Hardware: CPU (2 vCPUs, 8GB RAM)
- Framework: HuggingFace Transformers 5.5.0, PyTorch 2.11
- Strategy: Partial layer freezing to enable CPU-feasible training
Citation
If you use this model, please cite the base model and dataset:
@article{sanh2019distilbert,
title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
journal={arXiv preprint arXiv:1910.01108},
year={2019}
}
- Downloads last month
- 188
Model tree for Arcoson/distilgpt2-arxiv-csai
Base model
distilbert/distilgpt2Dataset used to train Arcoson/distilgpt2-arxiv-csai
Paper for Arcoson/distilgpt2-arxiv-csai
Evaluation results
- Perplexity on arXiv CS/AI Abstractsself-reported32.590