zancato commited on
Commit
c523573
·
verified ·
1 Parent(s): 35d2faa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -196,8 +196,8 @@ curl http://localhost:8000/v1/chat/completions \
196
  > [!TIP]
197
  > The `--mamba-cache-dtype float32` and `--mamba-ssm-cache-dtype float32` flags are important for accurate long-context generation. See the [Inference guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/Inference.md#recommended-flags-for-hybrid-models) for details on all recommended flags.
198
 
199
- ### With HuggingFace Transformers
200
- See the [Inference guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/Inference.md#huggingface-transformers-inference) for details on when we recommend the HuggingFace Transformers implementation as opposed to the highly optimized vLLM one.
201
 
202
  ```python
203
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -219,7 +219,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
219
 
220
  ### Training-Free Context Extension
221
 
222
- This model supports training-free context extension 2-4× its native context via an extension to Hybrid models of [PICASO cache composition](https://arxiv.org/abs/2502.17605). See the [State Composition guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/StateComposition.md) for usage. Note, this is currently supported in HuggingFace Transformers only.
223
 
224
 
225
  ## Training data
 
196
  > [!TIP]
197
  > The `--mamba-cache-dtype float32` and `--mamba-ssm-cache-dtype float32` flags are important for accurate long-context generation. See the [Inference guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/Inference.md#recommended-flags-for-hybrid-models) for details on all recommended flags.
198
 
199
+ ### With Hugging Face Transformers
200
+ See the [Inference guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/Inference.md#huggingface-transformers-inference) for details on when we recommend the Hugging Face Transformers implementation as opposed to the highly optimized vLLM one.
201
 
202
  ```python
203
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
219
 
220
  ### Training-Free Context Extension
221
 
222
+ This model supports training-free context extension 2-4× its native context via an extension to Hybrid models of [PICASO cache composition](https://arxiv.org/abs/2502.17605). See the [State Composition guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/StateComposition.md) for usage. Note, this is currently supported in Hugging Face Transformers only.
223
 
224
 
225
  ## Training data