Instructions to use ServiceNow-AI/SuperApriel-15b-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ServiceNow-AI/SuperApriel-15b-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ServiceNow-AI/SuperApriel-15b-Base", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained("ServiceNow-AI/SuperApriel-15b-Base", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ServiceNow-AI/SuperApriel-15b-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ServiceNow-AI/SuperApriel-15b-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ServiceNow-AI/SuperApriel-15b-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ServiceNow-AI/SuperApriel-15b-Base

SGLang

How to use ServiceNow-AI/SuperApriel-15b-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ServiceNow-AI/SuperApriel-15b-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ServiceNow-AI/SuperApriel-15b-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ServiceNow-AI/SuperApriel-15b-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ServiceNow-AI/SuperApriel-15b-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ServiceNow-AI/SuperApriel-15b-Base with Docker Model Runner:
```
docker model run hf.co/ServiceNow-AI/SuperApriel-15b-Base
```

SuperApriel-15b-Base

File size: 7,861 Bytes

---
license: mit
pipeline_tag: text-generation
library_name: transformers
track_downloads: true
---

# SuperApriel-15b-Base

<img src="assets/super-apriel.png" width="120" alt="thumbnail"/>      `/ˈɑː.pri.əl/`

A 15B-parameter **token-mixer supernet** derived from [Apriel-1.6](https://huggingface.co/ServiceNow-AI/Apriel-1.6-15b-Thinker) via stochastic distillation. Every decoder layer exposes **four trained mixer options**—Full Attention, Sliding Window Attention, Gated DeltaNet, and Kimi Delta Attention—enabling flexible architecture selection from a single checkpoint.

- **Model Size:** 15B parameters
- **Layers:** 48 decoder layers, each with 4 mixer variants
- **Context Length:** 262K positions (runtime dependent)
- **Languages:** English (best)

## Highlights

- **Supernet architecture**: Single checkpoint containing 4 mixer types at every layer, yielding 4⁴⁸ ≈ 7.9 × 10²⁸ possible architectures
- **Four mixer types**: Full Attention (FA), Sliding Window Attention (SWA, window=4096), Gated DeltaNet (GDN), Kimi Delta Attention (KDA)
- **Stage 1 distillation checkpoint**: Trained via stochastic distillation from frozen Apriel-1.6 teacher on 266B tokens
- **Foundation for fine-tuning**: Use this checkpoint to fine-tune on your own data with targeted placement strategies

## Model Overview

SuperApriel-15b-Base is the **Stage 1 (distillation) checkpoint** of the Super Apriel supernet. During training, all four mixer types at each layer were trained simultaneously using stochastic local sampling—each layer's mixer was drawn uniformly from the four types at each training step. Only mixer weights were trained; all shared parameters (FFNs, embeddings, layer norms, vision encoder) remain frozen from the Apriel-1.6 teacher.

This checkpoint is intended as a **foundation for downstream fine-tuning**. For a ready-to-use model with optimized deployment presets, see [SuperApriel-15b-Instruct](https://huggingface.co/ServiceNow-AI/SuperApriel-15b-Instruct).

### Architecture Details

| Component | Details |
|-----------|---------|
| Parameters | 15B |
| Decoder layers | 48 |
| Query / KV heads | 32 / 8 (grouped-query attention), d_h = 128 |
| Hidden dimension | 5,120 |
| FFN width | 14,336 (SiLU-gated) |
| Vocabulary | 131,072 tokens |
| Vision encoder | Pixtral (16×16 patches) |

### Mixer Types

| Mixer | Time | Memory | Description |
|-------|------|--------|-------------|
| Full Attention (FA) | O(n²) | O(n) KV cache | Standard grouped-query attention |
| Sliding Window (SWA) | O(w·n) | O(w) | Local window of 4,096 tokens |
| Gated DeltaNet (GDN) | O(n) | O(1) fixed state | Matrix-valued recurrent state with delta rule |
| Kimi Delta Attention (KDA) | O(n) | O(1) fixed state | Linear attention with channel-wise gating |

### Training Details

- **Objective**: Stochastic distillation from frozen Apriel-1.6 teacher
- **Losses**: Activation matching (𝓛_act), Forward KL (weight 0.1), Reverse KL (weight 0.9)
- **Data**: 266B tokens, curated mixture focused on reasoning and domain-specific data
- **Sampling**: Uniform local sampling (each layer independently samples a mixer type)
- **Compute**: Up to 192 H100 GPUs
- **Training framework**: [Fast-LLM](https://github.com/ServiceNow/Fast-LLM)

## How to Use

This checkpoint is intended as a foundation for fine-tuning and research, not for direct inference. For a ready-to-use model with optimized deployment presets and full serving instructions, see [SuperApriel-15b-Instruct](https://huggingface.co/ServiceNow-AI/SuperApriel-15b-Instruct).

### Loading for evaluation

If you need to load this checkpoint for evaluation or experimentation, copy a preset config from [SuperApriel-15b-Instruct](https://huggingface.co/ServiceNow-AI/SuperApriel-15b-Instruct) to select a specific mixer placement. The Base and Instruct checkpoints share the same architecture and config format — preset configs from Instruct work with this checkpoint.

For example, to load with the all-attention placement:

1. Download a preset `config.json` from `SuperApriel-15b-Instruct/preset_configs/all-attention/`
2. Place it as this model's `config.json`
3. Load with vLLM or Transformers following the [Instruct README instructions](https://huggingface.co/ServiceNow-AI/SuperApriel-15b-Instruct#how-to-use)

> **Note:** This model requires `trust_remote_code=True` as it uses custom architecture code for the multi-mixer supernet.

> **Note:** When serving with vLLM, custom placements must include at least one attention-type layer (FA or SWA). Configurations using only recurrent mixers (GDN/KDA) are not currently supported due to a vLLM KV cache coordinator limitation. All shipped Instruct presets satisfy this requirement.

## Intended Use

SuperApriel-15b-Base is designed as a **foundation checkpoint** for:

- Fine-tuning with custom placement strategies on domain-specific data
- Research on hybrid architectures and mixer placement optimization
- Placement search and Pareto frontier exploration using the optimization toolkit

It is **not intended** for direct deployment without further fine-tuning or for safety-critical applications without human oversight.

## Limitations

- **Factual accuracy:** May produce incorrect, misleading, or outdated content. Outputs should be verified before use in critical contexts.
- **Bias:** May reflect societal, cultural, or systemic biases present in training data.
- **Ethics:** Do not use the model to produce harmful, unlawful, or unethical content.
- **Language:** Strongest performance is in English. Output quality may degrade in underrepresented languages.
- **Critical use:** Not suitable for medical, legal, financial, or other high-risk applications without safeguards.
- **Base model:** This is a distillation checkpoint without instruction tuning. For instruction-following use cases, see [SuperApriel-15b-Instruct](https://huggingface.co/ServiceNow-AI/SuperApriel-15b-Instruct).

## Security and Responsible Use

**Security Responsibilities:**
Deployers and users are strongly encouraged to align their security practices with established frameworks and regulatory guidelines such as the EU AI Act and the NIST AI Risk Management Framework (RMF).

**Guidelines for Deployers:**

- Regularly conduct robustness assessments to identify and mitigate adversarial inputs.
- Implement validation and filtering processes to prevent harmful or biased outputs.
- Continuously perform data privacy checks to guard against unintended data leaks.
- Document and communicate the model's limitations, intended usage, and known security risks to all end-users.
- Schedule periodic security reviews and updates to address emerging threats and vulnerabilities.

**Guidelines for Users:**

- Follow established security policies and usage guidelines provided by deployers.
- Protect and manage sensitive information when interacting with the model.
- Report anomalies, suspicious behavior, or unsafe outputs to deployers or developers.
- Maintain human oversight and apply judgment to mitigate potential security or ethical risks during interactions.

**Disclaimer:**
Users accept responsibility for securely deploying, managing, and using this open-source LLM. The model is provided "as-is," without explicit or implied warranty regarding security or fitness for any specific application or environment.

## Software

- **Training stack:** [Fast-LLM](https://github.com/ServiceNow/Fast-LLM)
- **Serving:** [Fast-LLM vLLM plugin](https://github.com/ServiceNow/Fast-LLM/tree/feature/vllm-apriel2-models/apriel2-vllm-plugin)

## License

MIT

## Citation

```bibtex
@misc{super_apriel_2026,
  title        = {Super Apriel: One Checkpoint, Many Speeds},
  author       = {ServiceNow Language Models Lab},
  year         = {2026},
  eprint       = {2604.19877},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL}
}
```