Model Card for OLMo-2-0425-1B_tulu-3-sft-olmo-2-mixture-0225_lr0.0001_seed42

This model is a instruction fine-tuned version of allenai/OLMo-2-0425-1B trained using a LoRA adapter on Tülu3 for one epoch via TRL.

Uses

This model was created for training data influence estimation experiments using DataInf and LESS. See our paper and repo for details.

Quick start

from huggingface_hub import hf_hub_download
import json
from transformers import AutoTokenizer,AutoModelForCausalLM
from peft import PeftModel
from transformers import pipeline

repo_id = "loris3/OLMo-2-0425-1B_tulu-3-sft-olmo-2-mixture-0225_lr0.0001_seed42"

adapter_path = hf_hub_download(repo_id=repo_id, filename="adapter_config.json")
adapter_config = json.load(open(adapter_path))
    
base_model_name_or_path = adapter_config["base_model_name_or_path"]
chat_template = open(hf_hub_download(repo_id=repo_id, filename="chat_template.jinja")).read()


tokenizer = AutoTokenizer.from_pretrained(base_model_name_or_path)
tokenizer.chat_template = chat_template
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token 
    
model = AutoModelForCausalLM.from_pretrained(base_model_name_or_path)
model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(model, repo_id, is_trainable=False)



question = "Could you give us some of your political beliefs?"
generator = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
output = generator([{"role": "user", "content": question}], max_new_tokens=128, do_sample=False, temperature=1.0, top_p=1.0, return_full_text=False)[0]
print(output["generated_text"])

Training Hyperparameters

Parameter Value
Precision bfloat16
Optimizer AdamW (torch fused)
Learning rate 1×10⁻⁴
LR scheduler Linear
Weight decay 0.0
Max grad norm 1.0
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.1
LoRA bias none
Target modules q_proj, c_attn, v_proj
Trainable params LoRA only
Train batch size / device 4
Gradient accumulation 8
Effective batch size 32
Training epochs 1
Max sequence length 1024
Gradient checkpointing False
Seed 42

Framework versions

  • PEFT 0.17.1
  • TRL: 0.23.0
  • Transformers: 4.56.2
  • Pytorch: 2.8.0+cu126
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Evaluation

We evaluate with OLMES

Task suites: core_9mcqa::olmes, mmlu:mc::olmes, olmo_2_generative::olmes, olmo_2_heldout::olmes

Task Score
AGIEval 0.34
ARC_C 0.47
ARC_E 0.74
BBH 0.30
BoolQ 0.69
CSQA 0.60
CoQA 0.69
DROP 0.35
GSM8K 0.36
HSwag 0.60
JPRDY 0.63
MMLU 0.43
MMLU-Pro 0.19
NatQs 0.19
OBQA 0.51
PIQA 0.71
SIQA 0.56
SQuAD 0.80
TriviaQA 0.55
WinoG 0.61
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for loris3/OLMo-2-0425-1B_tulu-3-sft-olmo-2-mixture-0225_lr0.0001_seed42

Adapter
(1)
this model

Dataset used to train loris3/OLMo-2-0425-1B_tulu-3-sft-olmo-2-mixture-0225_lr0.0001_seed42

Collection including loris3/OLMo-2-0425-1B_tulu-3-sft-olmo-2-mixture-0225_lr0.0001_seed42

Paper for loris3/OLMo-2-0425-1B_tulu-3-sft-olmo-2-mixture-0225_lr0.0001_seed42

Evaluation results