Compact Example-Based Explanations for Language Models
Collection
Models created for training data influence estimation experiments, see paper for details. • 6 items • Updated
This model is a instruction fine-tuned version of allenai/OLMo-2-0425-1B trained using a LoRA adapter on Tülu3 for one epoch via TRL.
This model was created for training data influence estimation experiments using DataInf and LESS. See our paper and repo for details.
from huggingface_hub import hf_hub_download
import json
from transformers import AutoTokenizer,AutoModelForCausalLM
from peft import PeftModel
from transformers import pipeline
repo_id = "loris3/OLMo-2-0425-1B_tulu-3-sft-olmo-2-mixture-0225_lr0.0001_seed42"
adapter_path = hf_hub_download(repo_id=repo_id, filename="adapter_config.json")
adapter_config = json.load(open(adapter_path))
base_model_name_or_path = adapter_config["base_model_name_or_path"]
chat_template = open(hf_hub_download(repo_id=repo_id, filename="chat_template.jinja")).read()
tokenizer = AutoTokenizer.from_pretrained(base_model_name_or_path)
tokenizer.chat_template = chat_template
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(base_model_name_or_path)
model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(model, repo_id, is_trainable=False)
question = "Could you give us some of your political beliefs?"
generator = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
output = generator([{"role": "user", "content": question}], max_new_tokens=128, do_sample=False, temperature=1.0, top_p=1.0, return_full_text=False)[0]
print(output["generated_text"])
| Parameter | Value |
|---|---|
| Precision | bfloat16 |
| Optimizer | AdamW (torch fused) |
| Learning rate | 1×10⁻⁴ |
| LR scheduler | Linear |
| Weight decay | 0.0 |
| Max grad norm | 1.0 |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.1 |
| LoRA bias | none |
| Target modules | q_proj, c_attn, v_proj |
| Trainable params | LoRA only |
| Train batch size / device | 4 |
| Gradient accumulation | 8 |
| Effective batch size | 32 |
| Training epochs | 1 |
| Max sequence length | 1024 |
| Gradient checkpointing | False |
| Seed | 42 |
We evaluate with OLMES
Task suites: core_9mcqa::olmes, mmlu:mc::olmes, olmo_2_generative::olmes, olmo_2_heldout::olmes
| Task | Score |
|---|---|
| AGIEval | 0.34 |
| ARC_C | 0.47 |
| ARC_E | 0.74 |
| BBH | 0.30 |
| BoolQ | 0.69 |
| CSQA | 0.60 |
| CoQA | 0.69 |
| DROP | 0.35 |
| GSM8K | 0.36 |
| HSwag | 0.60 |
| JPRDY | 0.63 |
| MMLU | 0.43 |
| MMLU-Pro | 0.19 |
| NatQs | 0.19 |
| OBQA | 0.51 |
| PIQA | 0.71 |
| SIQA | 0.56 |
| SQuAD | 0.80 |
| TriviaQA | 0.55 |
| WinoG | 0.61 |
Base model
allenai/OLMo-2-0425-1B