Add note about vLLM pure-recurrent placement limitation
Browse filesCustom placements using only GDN/KDA mixers (no FA or SWA) are not
supported due to a vLLM KV cache coordinator constraint.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
README.md
CHANGED
|
@@ -75,6 +75,8 @@ For example, to load with the all-attention placement:
|
|
| 75 |
|
| 76 |
> **Note:** This model requires `trust_remote_code=True` as it uses custom architecture code for the multi-mixer supernet.
|
| 77 |
|
|
|
|
|
|
|
| 78 |
## Intended Use
|
| 79 |
|
| 80 |
SuperApriel-15b-Base is designed as a **foundation checkpoint** for:
|
|
|
|
| 75 |
|
| 76 |
> **Note:** This model requires `trust_remote_code=True` as it uses custom architecture code for the multi-mixer supernet.
|
| 77 |
|
| 78 |
+
> **Note:** When serving with vLLM, custom placements must include at least one attention-type layer (FA or SWA). Configurations using only recurrent mixers (GDN/KDA) are not currently supported due to a vLLM KV cache coordinator limitation. All shipped Instruct presets satisfy this requirement.
|
| 79 |
+
|
| 80 |
## Intended Use
|
| 81 |
|
| 82 |
SuperApriel-15b-Base is designed as a **foundation checkpoint** for:
|