denisko Claude Opus 4.6 commited on
Commit
7011c3d
·
1 Parent(s): 9ba9a9e

Add note about vLLM pure-recurrent placement limitation

Browse files

Custom placements using only GDN/KDA mixers (no FA or SWA) are not
supported due to a vLLM KV cache coordinator constraint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -75,6 +75,8 @@ For example, to load with the all-attention placement:
75
 
76
  > **Note:** This model requires `trust_remote_code=True` as it uses custom architecture code for the multi-mixer supernet.
77
 
 
 
78
  ## Intended Use
79
 
80
  SuperApriel-15b-Base is designed as a **foundation checkpoint** for:
 
75
 
76
  > **Note:** This model requires `trust_remote_code=True` as it uses custom architecture code for the multi-mixer supernet.
77
 
78
+ > **Note:** When serving with vLLM, custom placements must include at least one attention-type layer (FA or SWA). Configurations using only recurrent mixers (GDN/KDA) are not currently supported due to a vLLM KV cache coordinator limitation. All shipped Instruct presets satisfy this requirement.
79
+
80
  ## Intended Use
81
 
82
  SuperApriel-15b-Base is designed as a **foundation checkpoint** for: