Adaptive-RETRO-GPT-1B
Adaptive-RETRO-GPT-1B is a RETRO-inspired retrieval-pretrained decoder-only language model. Unlike a standard RAG system that only adds retrieved text at inference time, this model is trained with retrieved chunks available during next-token language modeling.
Training Setup
- Objective: next-token language modeling
- Backbone: decoder-only GPT
- Retrieval: external chunk datastore, top-k
2, retrieval sequence length512 - Retrieval mechanism: cross-attention layers plus learned adaptive retrieval gate
- Retrieval regularization: retrieval budget loss
0.001 - Retrieval robustness: no-retrieval probability
0.1, random-retrieval probability0.1 - Retrieval layers:
5,11,17 - Pretraining dataset:
HuggingFaceFW/fineweb-edu/sample-10BT - Datastore dataset:
wikimedia/wikipedia/20231101.en - Sequence length:
2048 - Parameters:
1,172,146,179 - Checkpoint step:
20000 - Related corpus repo:
kyLELEng/adaptive-retro-gpt-1b-corpus - Related datastore repo:
kyLELEng/adaptive-retro-gpt-1b-datastore
Latest Metrics
{
"step": 20000,
"retrieval_on": {
"loss": 1.7580267190933228,
"lm_loss": 1.7580267190933228,
"ppl": 5.800979131574639,
"gate_mean": 1.749867806211114e-06
},
"retrieval_off": {
"loss": 1.7650717496871948,
"lm_loss": 1.7650717496871948,
"ppl": 5.841991504112031,
"gate_mean": 0.0
},
"random_retrieval": {
"loss": 1.7536429166793823,
"lm_loss": 1.7536429166793823,
"ppl": 5.775604444698179,
"gate_mean": 1.7668644431978464e-06
},
"delta_lm_loss_off_minus_on": 0.00704503059387207,
"delta_lm_loss_random_minus_on": -0.00438380241394043
}
The evaluation compares retrieval-on, retrieval-off, and random-retrieval modes. This is the main ablation for whether the trained model is using retrieved context productively and whether it is robust to noisy retrieval.
Research Use
This is an experimental RETRO-style pretraining run for comparing retrieval-pretrained GPT models against dense GPT baselines at similar training budgets. It is not instruction tuned and should not be used as a factual assistant without further evaluation.
- Downloads last month
- 31