TRI-ML
/

Foundry-LLM-1.2B-800B

Text Generation

Model card Files Files and versions

Foundry-LLM-1.2B-800B / README.md

jmercat's picture

Upload README.md with huggingface_hub

e2d46d2 verified 15 days ago

|

history blame contribute delete

1.57 kB

	---
	license: apache-2.0
	library_name: vla-foundry
	tags:
	- foundry
	- vla_foundry
	- llm
	- text-generation
	---

	# Foundry-LLM-1.2B-800B

	A 1.2B parameter language model pretrained on 800B tokens, part of the [VLA Foundry](https://github.com/TRI-ML/vla_foundry) model collection.

	## Model Description

	- Architecture: Transformer (24 layers, 2048 hidden dim, 16 heads, SwiGLU FFN, RoPE, QK-norm)
	- Parameters: 1.2B (non-embedding)
	- Tokenizer: SmolVLM2 (vocab size 49,280)
	- Training data: 800B tokens from DCLM-Baseline-1.0
	- LR schedule: Warmup + constant (no decay)
	- Sequence length: 2048

	Earlier checkpoint of the Foundry LLM, used as the language backbone for the downstream VLM and VLA models.

	## Evaluation Results

	Multiple-choice reasoning benchmarks:

	\| HellaSwag \| MMLU \| ARC-e \| ARC-c \| PIQA \| WinoGrande \| OpenBookQA \| BoolQ \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| 64.3 \| 26.0 \| 70.3 \| 37.0 \| 75.8 \| 60.9 \| 40.0 \| 63.2 \|

	## Usage

	```bash
	git clone https://github.com/TRI-ML/vla_foundry.git
	cd vla_foundry
	pip install -e .
	```

	```python
	from vla_foundry.models.base_model import BaseModel
	model = BaseModel.from_pretrained("TRI-ML/Foundry-LLM-1.2B-800B")
	```

	## Links

	- Project page: [tri-ml.github.io/vla_foundry](https://tri-ml.github.io/vla_foundry/)
	- Paper: [VLA Foundry (arXiv 2604.19728)](https://arxiv.org/abs/2604.19728)
	- Code: [github.com/TRI-ML/vla_foundry](https://github.com/TRI-ML/vla_foundry)
	- Collection: [VLA Foundry collection](https://huggingface.co/collections/TRI-ML/vla-foundry)