Foundry-LLM-1.2B-1T / README.md
jmercat's picture
Upload README.md with huggingface_hub
1e71bc7 verified
metadata
license: apache-2.0
library_name: vla-foundry
tags:
  - foundry
  - vla_foundry
  - llm
  - text-generation

Foundry-LLM-1.2B-1T

A 1.2B parameter language model pretrained on 1T tokens, part of the VLA Foundry collection.

Model Description

  • Architecture: Transformer (24 layers, 2048 hidden dim, 16 heads, SwiGLU FFN, RoPE, QK-norm)
  • Parameters: 1.2B (non-embedding)
  • Tokenizer: SmolVLM2 (vocab size 49,280)
  • Training data: 1T tokens from DCLM-Baseline-1.0
  • LR schedule: Warmup + constant for 800B tokens, then 200B tokens of cosine decay
  • Sequence length: 2048

Continuation of Foundry-LLM-1.2B-800B with an additional 200B tokens of cosine-decayed training.

Evaluation Results

Multiple-choice reasoning benchmarks:

HellaSwag MMLU ARC-e ARC-c PIQA WinoGrande OpenBookQA BoolQ
66.7 26.6 71.7 39.3 77.5 62.6 40.8 65.4

Usage

git clone https://github.com/TRI-ML/vla_foundry.git
cd vla_foundry
pip install -e .
from vla_foundry.models.base_model import BaseModel
model = BaseModel.from_pretrained("TRI-ML/Foundry-LLM-1.2B-1T")

Links