ArgonneAI
Collection
Pretrained LLMs from scratch. • 7 items • Updated • 1
Argonne 2.5-base is a decoder-only transformer language model trained on a mixture of FineWeb and FineWeb-Edu data.
| Component | Specification |
|---|---|
| Parameters | 1,273,807,360 (~1.27B) |
| Layers | 28 transformer blocks |
| Hidden size | 1,792 |
| Attention heads | 14 query / 7 key-value (GQA) |
| Head dimension | 128 |
| Feed-forward | SwiGLU MLP, 4,864 intermediate dim |
| Context length | 1,024 tokens |
| Vocabulary size | 151,669 |
| Normalization | RMSNorm (ε = 1e-6) |
| Position encoding | RoPE (θ = 10,000) |
| Item | Value |
|---|---|
| Total steps | 425,975 |
| Tokens processed | ~76.05B |
| Final train loss | 2.6119 |
| Sequence length | 1,024 |
| Batch size per GPU | 20 |
| Gradient accumulation | 4 |
| Effective batch | 245,760 tokens |
| Learning rate | 3e-4 |
| Min LR ratio | 0.1 |
| Warmup | 1,000 steps |
| Precision | bf16 autocast |
| Checkpoint dtype | bfloat16 |
| Weight format | 5 sharded safetensors |
| torch.compile | Enabled |
| GPUs | 3x H2000s (DDP) |
This model uses the Qwen3 tokenizer family via the Qwen2Tokenizer compatibility class.
The release was built from the GitHub main branch codebase: https://github.com/PursuitOfDataScience/ArgonneAI/tree/main
Key scripts:
pretrain.pycontinue_pretrain.pyinference.pypush_model_to_hf.pyfrom transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json
import sys
from pathlib import Path
from safetensors.torch import load_file
model_id = "PursuitOfDataScience/Argonne2.5-base"
model_dir = Path(model_id)
sys.path.insert(0, model_id)
from model import ArgonneConfig, ArgonneModel
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
with (model_dir / "config.json").open() as f:
config_dict = json.load(f)
config = ArgonneConfig(**{k: v for k, v in config_dict.items() if not str(k).startswith("_")})
model = ArgonneModel(config)
state = model.state_dict()
with (model_dir / "model.safetensors.index.json").open() as f:
index = json.load(f)
for shard_name in sorted(set(index["weight_map"].values())):
shard = load_file(str(model_dir / shard_name), device="cpu")
for key, tensor in shard.items():
if key in state:
state[key].copy_(tensor)
model.tie_weights()
model = model.to("cuda", dtype=torch.bfloat16).eval()
prompt = "Write a short paragraph about scientific computing at Argonne National Laboratory."
messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
input_ids = inputs["input_ids"] if isinstance(inputs, dict) else (inputs.input_ids if hasattr(inputs, "input_ids") else inputs)
input_ids = input_ids.to(model.device)
output_ids = model.generate(
input_ids,
max_length=input_ids.shape[1] + 128,
temperature=0.8,
top_p=0.95,
top_k=50,
do_sample=False,
)
print(tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True))
trust_remote_code=True.generate method uses max_length rather than max_new_tokens.@misc{argonne25,
author = {PursuitOfDataScience},
title = {Argonne 2.5-base},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/PursuitOfDataScience/Argonne2.5-base}
}