Argonne 2.5-base

Argonne 2.5-base is a decoder-only transformer language model trained on a mixture of FineWeb and FineWeb-Edu data.

Model architecture

Component	Specification
Parameters	1,273,807,360 (~1.27B)
Layers	28 transformer blocks
Hidden size	1,792
Attention heads	14 query / 7 key-value (GQA)
Head dimension	128
Feed-forward	SwiGLU MLP, 4,864 intermediate dim
Context length	1,024 tokens
Vocabulary size	151,669
Normalization	RMSNorm (ε = 1e-6)
Position encoding	RoPE (θ = 10,000)

Training details

Item	Value
Total steps	425,975
Tokens processed	~76.05B
Final train loss	2.6119
Sequence length	1,024
Batch size per GPU	20
Gradient accumulation	4
Effective batch	245,760 tokens
Learning rate	3e-4
Min LR ratio	0.1
Warmup	1,000 steps
Precision	bf16 autocast
Checkpoint dtype	bfloat16
Weight format	5 sharded safetensors
torch.compile	Enabled
GPUs	3x H2000s (DDP)

Training data

FineWeb
FineWeb-Edu
Final stage training shard: 55.2B tokens
Cumulative training across the full run: 76.05B tokens

Tokenizer

This model uses the Qwen3 tokenizer family via the Qwen2Tokenizer compatibility class.

Source code

The release was built from the GitHub main branch codebase: https://github.com/PursuitOfDataScience/ArgonneAI/tree/main

Key scripts:

pretrain.py
continue_pretrain.py
inference.py
push_model_to_hf.py

Loss curve

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json
import sys
from pathlib import Path
from safetensors.torch import load_file

model_id = "PursuitOfDataScience/Argonne2.5-base"
model_dir = Path(model_id)

sys.path.insert(0, model_id)
from model import ArgonneConfig, ArgonneModel

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

with (model_dir / "config.json").open() as f:
    config_dict = json.load(f)
config = ArgonneConfig(**{k: v for k, v in config_dict.items() if not str(k).startswith("_")})

model = ArgonneModel(config)
state = model.state_dict()

with (model_dir / "model.safetensors.index.json").open() as f:
    index = json.load(f)

for shard_name in sorted(set(index["weight_map"].values())):
    shard = load_file(str(model_dir / shard_name), device="cpu")
    for key, tensor in shard.items():
        if key in state:
            state[key].copy_(tensor)

model.tie_weights()
model = model.to("cuda", dtype=torch.bfloat16).eval()

prompt = "Write a short paragraph about scientific computing at Argonne National Laboratory."
messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
input_ids = inputs["input_ids"] if isinstance(inputs, dict) else (inputs.input_ids if hasattr(inputs, "input_ids") else inputs)
input_ids = input_ids.to(model.device)

output_ids = model.generate(
    input_ids,
    max_length=input_ids.shape[1] + 128,
    temperature=0.8,
    top_p=0.95,
    top_k=50,
    do_sample=False,
)
print(tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True))

Usage notes

Load with trust_remote_code=True.
The custom generate method uses max_length rather than max_new_tokens.
Weights are published as 5 bf16 safetensor shards.
Switch to greedy decoding if you want deterministic output.

Citation

@misc{argonne25,
  author = {PursuitOfDataScience},
  title = {Argonne 2.5-base},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/PursuitOfDataScience/Argonne2.5-base}
}

Downloads last month: 699

Safetensors

Model size

1B params

Tensor type

BF16

Collection including PursuitOfDataScience/Argonne2.5-base

ArgonneAI

Collection

Pretrained LLMs from scratch. • 7 items • Updated about 8 hours ago • 1