Argonne 2.5-base

Argonne 2.5-base is a decoder-only transformer language model trained on a mixture of FineWeb and FineWeb-Edu data.

Model architecture

Component Specification
Parameters 1,273,807,360 (~1.27B)
Layers 28 transformer blocks
Hidden size 1,792
Attention heads 14 query / 7 key-value (GQA)
Head dimension 128
Feed-forward SwiGLU MLP, 4,864 intermediate dim
Context length 1,024 tokens
Vocabulary size 151,669
Normalization RMSNorm (ε = 1e-6)
Position encoding RoPE (θ = 10,000)

Training details

Item Value
Total steps 425,975
Tokens processed ~76.05B
Final train loss 2.6119
Sequence length 1,024
Batch size per GPU 20
Gradient accumulation 4
Effective batch 245,760 tokens
Learning rate 3e-4
Min LR ratio 0.1
Warmup 1,000 steps
Precision bf16 autocast
Checkpoint dtype bfloat16
Weight format 5 sharded safetensors
torch.compile Enabled
GPUs 3x H2000s (DDP)

Training data

  • FineWeb
  • FineWeb-Edu
  • Final stage training shard: 55.2B tokens
  • Cumulative training across the full run: 76.05B tokens

Tokenizer

This model uses the Qwen3 tokenizer family via the Qwen2Tokenizer compatibility class.

Source code

The release was built from the GitHub main branch codebase: https://github.com/PursuitOfDataScience/ArgonneAI/tree/main

Key scripts:

  • pretrain.py
  • continue_pretrain.py
  • inference.py
  • push_model_to_hf.py

Loss curve

Training loss curve

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json
import sys
from pathlib import Path
from safetensors.torch import load_file

model_id = "PursuitOfDataScience/Argonne2.5-base"
model_dir = Path(model_id)

sys.path.insert(0, model_id)
from model import ArgonneConfig, ArgonneModel

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

with (model_dir / "config.json").open() as f:
    config_dict = json.load(f)
config = ArgonneConfig(**{k: v for k, v in config_dict.items() if not str(k).startswith("_")})

model = ArgonneModel(config)
state = model.state_dict()

with (model_dir / "model.safetensors.index.json").open() as f:
    index = json.load(f)

for shard_name in sorted(set(index["weight_map"].values())):
    shard = load_file(str(model_dir / shard_name), device="cpu")
    for key, tensor in shard.items():
        if key in state:
            state[key].copy_(tensor)

model.tie_weights()
model = model.to("cuda", dtype=torch.bfloat16).eval()

prompt = "Write a short paragraph about scientific computing at Argonne National Laboratory."
messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
input_ids = inputs["input_ids"] if isinstance(inputs, dict) else (inputs.input_ids if hasattr(inputs, "input_ids") else inputs)
input_ids = input_ids.to(model.device)

output_ids = model.generate(
    input_ids,
    max_length=input_ids.shape[1] + 128,
    temperature=0.8,
    top_p=0.95,
    top_k=50,
    do_sample=False,
)
print(tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True))

Usage notes

  • Load with trust_remote_code=True.
  • The custom generate method uses max_length rather than max_new_tokens.
  • Weights are published as 5 bf16 safetensor shards.
  • Switch to greedy decoding if you want deterministic output.

Citation

@misc{argonne25,
  author = {PursuitOfDataScience},
  title = {Argonne 2.5-base},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/PursuitOfDataScience/Argonne2.5-base}
}
Downloads last month
699
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including PursuitOfDataScience/Argonne2.5-base