--- license: other library_name: transformers tags: - reasoning - extrapolation - synthetic-data - transformers --- # Interplay-LM Extrapolation RL Models This repository is organized by experiment setting. Each top-level directory corresponds to one pretraining mixture used in the extrapolation experiments. Within each setting: - `base/` stores the base model used to initialize RL. - `rl/` stores the final RL checkpoints for each experiment variant. Only inference-relevant Hugging Face files are included. ## Included settings - `id2-10_0.2easy_0.3medium_0.5hard` - `id2-10_0.5easy_0.3medium_0.2hard` - `id2-10_0.4995easy_0.4995medium_0.001hard` - `id2-10_0.475easy_0.475medium_0.05hard` ## Load ```python from transformers import AutoModelForCausalLM, AutoTokenizer repo_id = "Interplay-LM-Reasoning/extrapolation_rl" subdir = "id2-10_0.5easy_0.3medium_0.2hard/rl/op11-14_uniform" tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subdir) model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder=subdir) ``` ## Citation ```bibtex @misc{zhang2025interplaypretrainingmidtrainingrl, title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models}, author={Charlie Zhang and Graham Neubig and Xiang Yue}, year={2025}, eprint={2512.07783}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.07783}, } ```