denisko Claude Opus 4.6 commited on
Commit
c35ea9c
·
1 Parent(s): 5f43fe3

Sync modeling code with Instruct, update README for Base checkpoint

Browse files

- Sync modeling_apriel2.py from Instruct (adds PatternMixerAdapter for
pattern config weight loading, config_class fix for AutoModelForCausalLM)
- Add AutoModelForCausalLM to config.json auto_map
- Rewrite README: remove TODO placeholders, point to Instruct for
inference/serving, document how to copy preset configs for evaluation
- Fix citation year and serving link

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (3) hide show
  1. README.md +10 -59
  2. config.json +1 -0
  3. modeling_apriel2.py +40 -2
README.md CHANGED
@@ -61,68 +61,19 @@ This checkpoint is intended as a **foundation for downstream fine-tuning**. For
61
 
62
  ## How to Use
63
 
64
- Install dependencies:
65
 
66
- ```bash
67
- pip install transformers
68
- ```
69
-
70
- > **🔴 TODO: There is currently no mechanism to select a placement when using Transformers directly. The model defaults to the all-attention preset during inference. Placement selection requires vLLM with the Fast-LLM plugin (see below). We need to add a Transformers API for placement switching.**
71
-
72
- Basic usage with Transformers (all-attention preset):
73
-
74
- ```python
75
- from transformers import AutoModelForCausalLM, AutoTokenizer
76
-
77
- model_name = "ServiceNow-AI/SuperApriel-15b-Base"
78
-
79
- tokenizer = AutoTokenizer.from_pretrained(model_name)
80
- model = AutoModelForCausalLM.from_pretrained(
81
- model_name,
82
- torch_dtype="auto",
83
- device_map="auto",
84
- trust_remote_code=True,
85
- )
86
-
87
- prompt = "The capital of France is"
88
- inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
89
-
90
- generated_ids = model.generate(**inputs, max_new_tokens=64)
91
- output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
92
- print(output)
93
- ```
94
 
95
- > **Note:** This base model requires `trust_remote_code=True` as it uses custom architecture code for the multi-mixer supernet.
96
 
97
- ## Use with vLLM
98
 
99
- > **🔴 TODO: Confirm the exact vLLM plugin source (Fast-LLM branch/tag?), installation steps, and the CLI/API for selecting a placement. The instructions below are placeholders.**
 
 
100
 
101
- The supernet is served via a vLLM plugin implemented in [Fast-LLM](https://github.com/ServiceNow/Fast-LLM). Two serving modes are available:
102
-
103
- - **`single-preset` mode**: Only the weights of a single selected mixer placement are loaded. Inactive mixer weights are offloaded to CPU, so GPU memory footprint and throughput match a dedicated single-placement model.
104
- - **`supernet` mode**: The full supernet is loaded into memory, enabling placement switching at a per-request level (5–15s switch time depending on how many layers change mixer type).
105
-
106
- ### Installation
107
-
108
- ```bash
109
- uv venv --python 3.12 --seed
110
- source .venv/bin/activate
111
-
112
- git clone git@github.com:ServiceNow/Fast-LLM.git
113
- cd Fast-LLM
114
- uv pip install vllm==0.10.2 --torch-backend=auto
115
- pip install .
116
- ```
117
-
118
- ### Running a vLLM Server
119
-
120
- ```bash
121
- vllm serve \
122
- --model ServiceNow-AI/SuperApriel-15b-Base \
123
- --port 8000 \
124
- --trust-remote-code
125
- ```
126
 
127
  ## Intended Use
128
 
@@ -169,7 +120,7 @@ Users accept responsibility for securely deploying, managing, and using this ope
169
  ## Software
170
 
171
  - **Training stack:** [Fast-LLM](https://github.com/ServiceNow/Fast-LLM)
172
- - **Serving:** [Fast-LLM vLLM plugin](https://github.com/ServiceNow/Fast-LLM)
173
 
174
  ## License
175
 
@@ -178,7 +129,7 @@ MIT
178
  ## Citation
179
 
180
  ```bibtex
181
- @misc{super_apriel_2025,
182
  title = {Super Apriel: One Checkpoint, Many Speeds},
183
  author = {ServiceNow Language Models Lab},
184
  year = {2026},
 
61
 
62
  ## How to Use
63
 
64
+ This checkpoint is intended as a foundation for fine-tuning and research, not for direct inference. For a ready-to-use model with optimized deployment presets and full serving instructions, see [SuperApriel-15b-Instruct](https://huggingface.co/ServiceNow-AI/SuperApriel-15b-Instruct).
65
 
66
+ ### Loading for evaluation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
+ If you need to load this checkpoint for evaluation or experimentation, copy a preset config from [SuperApriel-15b-Instruct](https://huggingface.co/ServiceNow-AI/SuperApriel-15b-Instruct) to select a specific mixer placement. The Base and Instruct checkpoints share the same architecture and config format — preset configs from Instruct work with this checkpoint.
69
 
70
+ For example, to load with the all-attention placement:
71
 
72
+ 1. Download a preset `config.json` from `SuperApriel-15b-Instruct/preset_configs/all-attention/`
73
+ 2. Place it as this model's `config.json`
74
+ 3. Load with vLLM or Transformers following the [Instruct README instructions](https://huggingface.co/ServiceNow-AI/SuperApriel-15b-Instruct#how-to-use)
75
 
76
+ > **Note:** This model requires `trust_remote_code=True` as it uses custom architecture code for the multi-mixer supernet.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
  ## Intended Use
79
 
 
120
  ## Software
121
 
122
  - **Training stack:** [Fast-LLM](https://github.com/ServiceNow/Fast-LLM)
123
+ - **Serving:** [Fast-LLM vLLM plugin](https://github.com/ServiceNow/Fast-LLM/tree/oo/feature/vllm-apriel2-model-modeling/apriel2-vllm-plugin)
124
 
125
  ## License
126
 
 
129
  ## Citation
130
 
131
  ```bibtex
132
+ @misc{super_apriel_2026,
133
  title = {Super Apriel: One Checkpoint, Many Speeds},
134
  author = {ServiceNow Language Models Lab},
135
  year = {2026},
config.json CHANGED
@@ -5,6 +5,7 @@
5
  "auto_map": {
6
  "AutoConfig": "configuration_apriel2.Apriel2Config",
7
  "AutoModel": "modeling_apriel2.Apriel2Model",
 
8
  "AutoModelForImageTextToText": "modeling_apriel2.Apriel2ForConditionalGeneration"
9
  },
10
  "bos_token_id": 1,
 
5
  "auto_map": {
6
  "AutoConfig": "configuration_apriel2.Apriel2Config",
7
  "AutoModel": "modeling_apriel2.Apriel2Model",
8
+ "AutoModelForCausalLM": "modeling_apriel2.Apriel2ForCausalLM",
9
  "AutoModelForImageTextToText": "modeling_apriel2.Apriel2ForConditionalGeneration"
10
  },
11
  "bos_token_id": 1,
modeling_apriel2.py CHANGED
@@ -972,6 +972,32 @@ def create_mixer(mixer_config: dict, hidden_size: int, layer_idx: int, config, a
972
  return mixer_class(hidden_size, mixer_config, layer_idx=layer_idx)
973
 
974
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
975
  class Apriel2Mamba(nn.Module):
976
  """Mamba mixer."""
977
 
@@ -1906,14 +1932,16 @@ class Apriel2BlockSequence(nn.Module):
1906
 
1907
  blocks = []
1908
  for layer_idx in range(num_blocks):
1909
- # Get block_config for this layer
1910
  if seq_type == "fixed":
1911
  block_config = self.sequence_config.get("block", {})
 
1912
  elif seq_type == "pattern":
1913
  pattern = self.sequence_config.get("pattern", [])
1914
  blocks_config = self.sequence_config.get("blocks", {})
1915
  block_name = pattern[layer_idx % len(pattern)]
1916
  block_config = blocks_config[block_name]
 
1917
  else:
1918
  raise ValueError(f"Unknown sequence type: {seq_type}")
1919
 
@@ -1925,6 +1953,7 @@ class Apriel2BlockSequence(nn.Module):
1925
  layer_idx=layer_idx,
1926
  rms_norm_eps=rms_norm_eps,
1927
  config=self.config,
 
1928
  )
1929
  )
1930
 
@@ -2031,6 +2060,7 @@ class Apriel2Block(nn.Module):
2031
  layer_idx: int,
2032
  rms_norm_eps: float,
2033
  config: Apriel2TextConfig,
 
2034
  ):
2035
  """
2036
  Args:
@@ -2039,6 +2069,7 @@ class Apriel2Block(nn.Module):
2039
  layer_idx: Layer index in the sequence
2040
  rms_norm_eps: Epsilon for RMS normalization
2041
  config: Model config (passed to mixers that need it)
 
2042
  """
2043
  super().__init__()
2044
  self.hidden_size = hidden_size
@@ -2046,7 +2077,13 @@ class Apriel2Block(nn.Module):
2046
 
2047
  # Create mixer based on type
2048
  mixer_config = block_config.get("mixer", {"type": "attention"})
2049
- self.mixer = create_mixer(mixer_config, hidden_size, layer_idx, config, allow_stochastic=True)
 
 
 
 
 
 
2050
 
2051
  # Create MLP
2052
  mlp_config = block_config.get("mlp", {"type": "mlp"})
@@ -2435,6 +2472,7 @@ class Apriel2TextModel(Apriel2PreTrainedModel):
2435
  class Apriel2ForCausalLM(Apriel2PreTrainedModel, GenerationMixin):
2436
  """Apriel2 model with a language modeling head (text-only)."""
2437
 
 
2438
  _tied_weights_keys = ["lm_head.weight"]
2439
 
2440
  def __init__(self, config: Apriel2TextConfig):
 
972
  return mixer_class(hidden_size, mixer_config, layer_idx=layer_idx)
973
 
974
 
975
+ class Apriel2PatternMixerAdapter(nn.Module):
976
+ """Adapter that wraps a single mixer under mixers.{name} to match supernet weight paths.
977
+
978
+ The supernet checkpoint stores weights as blocks.{i}.mixer.mixers.{type}.{param},
979
+ but a bare mixer creates blocks.{i}.mixer.{param}. This adapter adds the intermediate
980
+ mixers.{name} level so pattern configs can load from supernet checkpoints.
981
+ """
982
+
983
+ def __init__(self, mixer_name: str, mixer: nn.Module):
984
+ super().__init__()
985
+ self.mixers = nn.ModuleDict({mixer_name: mixer})
986
+ self._mixer_name = mixer_name
987
+
988
+ def forward(self, *args, **kwargs):
989
+ return self.mixers[self._mixer_name](*args, **kwargs)
990
+
991
+ def preprocess(self, *args, **kwargs):
992
+ return self.mixers[self._mixer_name].preprocess(*args, **kwargs)
993
+
994
+ @classmethod
995
+ def setup(cls, mixer_name: str, mixer_config: dict, hidden_size: int, max_position_embeddings: int) -> nn.ModuleDict:
996
+ mixer_type = mixer_config.get("type", "attention")
997
+ mixer_class = get_mixer_class(mixer_type)
998
+ return mixer_class.setup(mixer_config, hidden_size, max_position_embeddings)
999
+
1000
+
1001
  class Apriel2Mamba(nn.Module):
1002
  """Mamba mixer."""
1003
 
 
1932
 
1933
  blocks = []
1934
  for layer_idx in range(num_blocks):
1935
+ # Get block_config and block_name for this layer
1936
  if seq_type == "fixed":
1937
  block_config = self.sequence_config.get("block", {})
1938
+ block_name_for_layer = None # No adapter needed for fixed type
1939
  elif seq_type == "pattern":
1940
  pattern = self.sequence_config.get("pattern", [])
1941
  blocks_config = self.sequence_config.get("blocks", {})
1942
  block_name = pattern[layer_idx % len(pattern)]
1943
  block_config = blocks_config[block_name]
1944
+ block_name_for_layer = block_name # Pass to Apriel2Block for weight path matching
1945
  else:
1946
  raise ValueError(f"Unknown sequence type: {seq_type}")
1947
 
 
1953
  layer_idx=layer_idx,
1954
  rms_norm_eps=rms_norm_eps,
1955
  config=self.config,
1956
+ block_name=block_name_for_layer,
1957
  )
1958
  )
1959
 
 
2060
  layer_idx: int,
2061
  rms_norm_eps: float,
2062
  config: Apriel2TextConfig,
2063
+ block_name: Optional[str] = None,
2064
  ):
2065
  """
2066
  Args:
 
2069
  layer_idx: Layer index in the sequence
2070
  rms_norm_eps: Epsilon for RMS normalization
2071
  config: Model config (passed to mixers that need it)
2072
+ block_name: For pattern configs, the mixer name (e.g. "attention") to match supernet weight paths
2073
  """
2074
  super().__init__()
2075
  self.hidden_size = hidden_size
 
2077
 
2078
  # Create mixer based on type
2079
  mixer_config = block_config.get("mixer", {"type": "attention"})
2080
+ raw_mixer = create_mixer(mixer_config, hidden_size, layer_idx, config, allow_stochastic=True)
2081
+
2082
+ # For pattern configs, wrap in adapter to match supernet checkpoint weight paths
2083
+ if block_name is not None:
2084
+ self.mixer = Apriel2PatternMixerAdapter(block_name, raw_mixer)
2085
+ else:
2086
+ self.mixer = raw_mixer
2087
 
2088
  # Create MLP
2089
  mlp_config = block_config.get("mlp", {"type": "mlp"})
 
2472
  class Apriel2ForCausalLM(Apriel2PreTrainedModel, GenerationMixin):
2473
  """Apriel2 model with a language modeling head (text-only)."""
2474
 
2475
+ config_class = Apriel2Config
2476
  _tied_weights_keys = ["lm_head.weight"]
2477
 
2478
  def __init__(self, config: Apriel2TextConfig):