Update README.md
Browse files
README.md
CHANGED
|
@@ -202,8 +202,8 @@ curl http://localhost:8000/v1/chat/completions \
|
|
| 202 |
> [!TIP]
|
| 203 |
> The `--mamba-cache-dtype float32` and `--mamba-ssm-cache-dtype float32` flags are important for accurate long-context generation. See the [Inference guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/Inference.md#recommended-flags-for-hybrid-models) for details on all recommended flags.
|
| 204 |
|
| 205 |
-
### With
|
| 206 |
-
See the [Inference guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/Inference.md#huggingface-transformers-inference) for details on when we recommend the
|
| 207 |
|
| 208 |
```python
|
| 209 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
@@ -225,7 +225,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
| 225 |
|
| 226 |
### Training-Free Context Extension
|
| 227 |
|
| 228 |
-
This model supports training-free context extension 2-4× its native context via an extension to Hybrid models of [PICASO cache composition](https://arxiv.org/abs/2502.17605). See the [State Composition guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/StateComposition.md) for usage. Note, this is currently supported in
|
| 229 |
|
| 230 |
|
| 231 |
## Training data
|
|
|
|
| 202 |
> [!TIP]
|
| 203 |
> The `--mamba-cache-dtype float32` and `--mamba-ssm-cache-dtype float32` flags are important for accurate long-context generation. See the [Inference guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/Inference.md#recommended-flags-for-hybrid-models) for details on all recommended flags.
|
| 204 |
|
| 205 |
+
### With Hugging Face Transformers
|
| 206 |
+
See the [Inference guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/Inference.md#huggingface-transformers-inference) for details on when we recommend the Hugging Face Transformers implementation as opposed to the highly optimized vLLM one.
|
| 207 |
|
| 208 |
```python
|
| 209 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
| 225 |
|
| 226 |
### Training-Free Context Extension
|
| 227 |
|
| 228 |
+
This model supports training-free context extension 2-4× its native context via an extension to Hybrid models of [PICASO cache composition](https://arxiv.org/abs/2502.17605). See the [State Composition guide](https://github.com/awslabs/hybrid-model-factory/blob/main/docs/StateComposition.md) for usage. Note, this is currently supported in Hugging Face Transformers only.
|
| 229 |
|
| 230 |
|
| 231 |
## Training data
|