joerowell commited on
Commit
268ec27
·
verified ·
1 Parent(s): 26d8c7a

Add vLLM and Transformers usage snippets

Browse files
Files changed (1) hide show
  1. README.md +47 -2
README.md CHANGED
@@ -132,11 +132,56 @@ Thanks to support from Ollama and the mlx-lm team...
132
 
133
  #### vLLM
134
 
135
- [...]
 
 
 
 
 
 
 
 
136
 
137
  #### Transformers
138
 
139
- [...]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
 
141
  #### [Other frameworks]
142
 
 
132
 
133
  #### vLLM
134
 
135
+ Serve Laguna XS.2 locally with vLLM and query it from any OpenAI-compatible client (see [Controlling reasoning](#controlling-reasoning) for tool calls, streaming, and reasoning extraction):
136
+
137
+ ```shell
138
+ pip install "vllm>=<PENDING_VERSION>"
139
+
140
+ vllm serve poolside/Laguna-XS.2 \
141
+ --max-model-len 131072 \
142
+ --default-chat-template-kwargs '{"enable_thinking": true}'
143
+ ```
144
 
145
  #### Transformers
146
 
147
+ > Requires `transformers >= <PENDING_VERSION>` (Laguna support is on the upcoming release; until then install from source).
148
+
149
+ ```python
150
+ import torch
151
+ from transformers import AutoModelForCausalLM, AutoTokenizer
152
+
153
+ model_id = "poolside/Laguna-XS.2"
154
+
155
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
156
+ model = AutoModelForCausalLM.from_pretrained(
157
+ model_id,
158
+ dtype=torch.bfloat16,
159
+ device_map="auto",
160
+ )
161
+
162
+ messages = [
163
+ {"role": "user", "content": "Write a Python retry wrapper with exponential backoff."},
164
+ ]
165
+
166
+ # Reasoning is on by default; pass enable_thinking=False to skip the <think> block.
167
+ inputs = tokenizer.apply_chat_template(
168
+ messages,
169
+ add_generation_prompt=True,
170
+ return_tensors="pt",
171
+ enable_thinking=True,
172
+ ).to(model.device)
173
+
174
+ outputs = model.generate(
175
+ inputs,
176
+ max_new_tokens=1024,
177
+ do_sample=True,
178
+ temperature=0.7,
179
+ top_k=20,
180
+ )
181
+
182
+ response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
183
+ print(response)
184
+ ```
185
 
186
  #### [Other frameworks]
187