kyLELEng
/

adaptive-retro-gpt-1b

Text Generation

adaptive_retro_gpt

retrieval-augmented

adaptive-retrieval

Model card Files Files and versions

adaptive-retro-gpt-1b / README.md

kyLELEng's picture

Train Adaptive-RETRO-GPT-1B

5a8b07f verified 18 days ago

|

history blame contribute delete

2.53 kB

	---
	library_name: pytorch
	tags:
	- text-generation
	- causal-lm
	- retrieval-augmented
	- retro
	- pretraining
	- adaptive-retrieval
	datasets:
	- HuggingFaceFW/fineweb-edu
	- wikimedia/wikipedia
	---

	# Adaptive-RETRO-GPT-1B

	Adaptive-RETRO-GPT-1B is a RETRO-inspired retrieval-pretrained decoder-only language model. Unlike a standard RAG system that only adds retrieved text at inference time, this model is trained with retrieved chunks available during next-token language modeling.

	## Training Setup

	- Objective: next-token language modeling
	- Backbone: decoder-only GPT
	- Retrieval: external chunk datastore, top-k `2`, retrieval sequence length `512`
	- Retrieval mechanism: cross-attention layers plus learned adaptive retrieval gate
	- Retrieval regularization: retrieval budget loss `0.001`
	- Retrieval robustness: no-retrieval probability `0.1`, random-retrieval probability `0.1`
	- Retrieval layers: `5,11,17`
	- Pretraining dataset: `HuggingFaceFW/fineweb-edu` / `sample-10BT`
	- Datastore dataset: `wikimedia/wikipedia` / `20231101.en`
	- Sequence length: `2048`
	- Parameters: `1,172,146,179`
	- Checkpoint step: `20000`
	- Related corpus repo: [`kyLELEng/adaptive-retro-gpt-1b-corpus`](https://huggingface.co/datasets/kyLELEng/adaptive-retro-gpt-1b-corpus)
	- Related datastore repo: [`kyLELEng/adaptive-retro-gpt-1b-datastore`](https://huggingface.co/datasets/kyLELEng/adaptive-retro-gpt-1b-datastore)

	## Latest Metrics

	```json
	{
	"step": 20000,
	"retrieval_on": {
	"loss": 1.7580267190933228,
	"lm_loss": 1.7580267190933228,
	"ppl": 5.800979131574639,
	"gate_mean": 1.749867806211114e-06
	},
	"retrieval_off": {
	"loss": 1.7650717496871948,
	"lm_loss": 1.7650717496871948,
	"ppl": 5.841991504112031,
	"gate_mean": 0.0
	},
	"random_retrieval": {
	"loss": 1.7536429166793823,
	"lm_loss": 1.7536429166793823,
	"ppl": 5.775604444698179,
	"gate_mean": 1.7668644431978464e-06
	},
	"delta_lm_loss_off_minus_on": 0.00704503059387207,
	"delta_lm_loss_random_minus_on": -0.00438380241394043
	}
	```

	The evaluation compares retrieval-on, retrieval-off, and random-retrieval modes. This is the main ablation for whether the trained model is using retrieved context productively and whether it is robust to noisy retrieval.

	## Research Use

	This is an experimental RETRO-style pretraining run for comparing retrieval-pretrained GPT models against dense GPT baselines at similar training budgets. It is not instruction tuned and should not be used as a factual assistant without further evaluation.