amazon
/

GKA-primed-HQwen3-32B-Instruct

Text Generation

state-space-model

linear-attention

instruction-tuned

Model card Files Files and versions

zancato commited on 19 days ago

Commit

3c549fe

·

verified ·

1 Parent(s): c807d1e

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -202,8 +202,8 @@ curl http://localhost:8000/v1/chat/completions \
 > [!TIP]
 > The `--mamba-cache-dtype float32` and `--mamba-ssm-cache-dtype float32` flags are important for accurate long-context generation. See the [Inference guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/Inference.md#recommended-flags-for-hybrid-models) for details on all recommended flags.
-### With HuggingFace Transformers
-See the [Inference guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/Inference.md#huggingface-transformers-inference) for details on when we recommend the HuggingFace Transformers implementation as opposed to the highly optimized vLLM one.
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -225,7 +225,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ### Training-Free Context Extension
-This model supports training-free context extension 2-4× its native context via an extension to Hybrid models of [PICASO cache composition](https://arxiv.org/abs/2502.17605). See the [State Composition guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/StateComposition.md) for usage. Note, this is currently supported in HuggingFace Transformers only.
 ## Training data

 > [!TIP]
 > The `--mamba-cache-dtype float32` and `--mamba-ssm-cache-dtype float32` flags are important for accurate long-context generation. See the [Inference guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/Inference.md#recommended-flags-for-hybrid-models) for details on all recommended flags.
+### With Hugging Face Transformers
+See the [Inference guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/Inference.md#huggingface-transformers-inference) for details on when we recommend the Hugging Face Transformers implementation as opposed to the highly optimized vLLM one.
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 ### Training-Free Context Extension
+This model supports training-free context extension 2-4× its native context via an extension to Hybrid models of [PICASO cache composition](https://arxiv.org/abs/2502.17605). See the [State Composition guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/StateComposition.md) for usage. Note, this is currently supported in Hugging Face Transformers only.
 ## Training data