OLMo-3 Recurrent Adapter - CoT SFT (rec=2, coda=1, untied)

This is a Recurrent Adapter Model fine-tuned on MetaMathQA for mathematical reasoning, built on top of OLMo-3-1025-7B.

Model Details

Base Model: allenai/OLMo-3-1025-7B
Architecture: Recurrent Adapter (2 recurrent layer(s) + 1 coda layers)
LM Head: untied
Training Format: Chain-of-Thought (CoT)
Mean Recurrence: 32 iterations
Final Training Loss: 0.1183
Validation Loss: 0.2337

Training Details

Dataset: MetaMathQA
Learning Rate: 1e-4
Training Steps: 50,000
Sequence Length: 4096 (CoT)
Gradient Accumulation: 8 steps

Recurrent Adapter Architecture

The model uses a recurrent adapter architecture where:

The frozen base model extracts initial representations
A recurrent block (2 layer(s)) processes information over multiple iterations (mean=32)
A coda block (1 layer(s)) produces the final output
Only the adapter layers are trained

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "hanseungwook/olmo3-recurrent-adapter-sft-cot-rec2-coda1-untied",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-3-1025-7B")

prompt = "What is 25 * 37?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        num_recurrence_steps=32,
        temperature=0.7,
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

License

Apache 2.0

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

hanseungwook
/

olmo3-recurrent-adapter-sft-cot-rec2-coda1-untied