TRI-ML
/

Foundry-LLM-1.2B-1T

Text Generation

Model card Files Files and versions

Foundry-LLM-1.2B-1T / README.md

jmercat's picture

Upload README.md with huggingface_hub

1e71bc7 verified 15 days ago

|

history blame contribute delete

1.64 kB

	---
	license: apache-2.0
	library_name: vla-foundry
	tags:
	- foundry
	- vla_foundry
	- llm
	- text-generation
	---

	# Foundry-LLM-1.2B-1T

	A 1.2B parameter language model pretrained on 1T tokens, part of the [VLA Foundry](https://github.com/TRI-ML/vla_foundry) collection.

	## Model Description

	- Architecture: Transformer (24 layers, 2048 hidden dim, 16 heads, SwiGLU FFN, RoPE, QK-norm)
	- Parameters: 1.2B (non-embedding)
	- Tokenizer: SmolVLM2 (vocab size 49,280)
	- Training data: 1T tokens from DCLM-Baseline-1.0
	- LR schedule: Warmup + constant for 800B tokens, then 200B tokens of cosine decay
	- Sequence length: 2048

	Continuation of [Foundry-LLM-1.2B-800B](https://huggingface.co/TRI-ML/Foundry-LLM-1.2B-800B) with an additional 200B tokens of cosine-decayed training.

	## Evaluation Results

	Multiple-choice reasoning benchmarks:

	\| HellaSwag \| MMLU \| ARC-e \| ARC-c \| PIQA \| WinoGrande \| OpenBookQA \| BoolQ \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| 66.7 \| 26.6 \| 71.7 \| 39.3 \| 77.5 \| 62.6 \| 40.8 \| 65.4 \|

	## Usage

	```bash
	git clone https://github.com/TRI-ML/vla_foundry.git
	cd vla_foundry
	pip install -e .
	```

	```python
	from vla_foundry.models.base_model import BaseModel
	model = BaseModel.from_pretrained("TRI-ML/Foundry-LLM-1.2B-1T")
	```

	## Links

	- Project page: [tri-ml.github.io/vla_foundry](https://tri-ml.github.io/vla_foundry/)
	- Paper: [VLA Foundry (arXiv 2604.19728)](https://arxiv.org/abs/2604.19728)
	- Code: [github.com/TRI-ML/vla_foundry](https://github.com/TRI-ML/vla_foundry)
	- Collection: [VLA Foundry collection](https://huggingface.co/collections/TRI-ML/vla-foundry)