Adaptive-RETRO-GPT-1B

Adaptive-RETRO-GPT-1B is a RETRO-inspired retrieval-pretrained decoder-only language model. Unlike a standard RAG system that only adds retrieved text at inference time, this model is trained with retrieved chunks available during next-token language modeling.

Training Setup

Objective: next-token language modeling
Backbone: decoder-only GPT
Retrieval: external chunk datastore, top-k 2, retrieval sequence length 512
Retrieval mechanism: cross-attention layers plus learned adaptive retrieval gate
Retrieval regularization: retrieval budget loss 0.001
Retrieval robustness: no-retrieval probability 0.1, random-retrieval probability 0.1
Retrieval layers: 5,11,17
Pretraining dataset: HuggingFaceFW/fineweb-edu / sample-10BT
Datastore dataset: wikimedia/wikipedia / 20231101.en
Sequence length: 2048
Parameters: 1,172,146,179
Checkpoint step: 20000
Related corpus repo: kyLELEng/adaptive-retro-gpt-1b-corpus
Related datastore repo: kyLELEng/adaptive-retro-gpt-1b-datastore

Latest Metrics

{
  "step": 20000,
  "retrieval_on": {
    "loss": 1.7580267190933228,
    "lm_loss": 1.7580267190933228,
    "ppl": 5.800979131574639,
    "gate_mean": 1.749867806211114e-06
  },
  "retrieval_off": {
    "loss": 1.7650717496871948,
    "lm_loss": 1.7650717496871948,
    "ppl": 5.841991504112031,
    "gate_mean": 0.0
  },
  "random_retrieval": {
    "loss": 1.7536429166793823,
    "lm_loss": 1.7536429166793823,
    "ppl": 5.775604444698179,
    "gate_mean": 1.7668644431978464e-06
  },
  "delta_lm_loss_off_minus_on": 0.00704503059387207,
  "delta_lm_loss_random_minus_on": -0.00438380241394043
}

The evaluation compares retrieval-on, retrieval-off, and random-retrieval modes. This is the main ablation for whether the trained model is using retrieved context productively and whether it is robust to noisy retrieval.

Research Use

This is an experimental RETRO-style pretraining run for comparing retrieval-pretrained GPT models against dense GPT baselines at similar training budgets. It is not instruction tuned and should not be used as a factual assistant without further evaluation.

Downloads last month: 31

Safetensors

Model size

1B params

Tensor type

BF16

kyLELEng
/

adaptive-retro-gpt-1b

Adaptive-RETRO-GPT-1B

Training Setup

Latest Metrics

Research Use

Datasets used to train kyLELEng/adaptive-retro-gpt-1b