varunrandery commited on
Commit
a52c866
·
verified ·
1 Parent(s): e2f4e4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +212 -23
README.md CHANGED
@@ -4,45 +4,121 @@ inference: false
4
  base_model:
5
  - poolside/Laguna-XS.2-base
6
  extra_gated_description: >-
7
- To learn more about how we process your personal data, please read
8
- our <a href="https://poolside.ai/privacy">Privacy Policy</a>.
9
  tags:
10
  - laguna-xs.2
11
  license: apache-2.0
 
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  # Laguna XS.2
15
- Laguna XS.2 is an agentic coding model designed for software engineering use cases and tool calling capabilities.
16
 
17
- This is the base model. For post-trained variants, please see the other models in the collection.
 
18
 
19
- For more details, check out our [release blog post]().
20
 
21
- ## Key features
22
- - **Mixture of Experts architecture with sigmoid gating**: Laguna XS.2 uses a sigmoid scoring function with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers.
23
- - **Reasoning control**: [Enable or disable thinking per-request].
24
- - **Apache-2.0 license**: Use and modify freely for commerical and non-commercial purposes.
 
 
 
 
25
 
26
  ## Model overview
27
 
28
  - Training: pre-training, post-training and reinforcement learning stages (instruct)
29
- - Number of parameters: 33B total with 3B activated
30
- - Layers:
31
- - Experts:
 
 
 
32
  - Context window: 131,072 tokens
33
  - Reasoning support: thinking default enabled; interleaved thinking with preserved thinking supported
34
 
35
  ## Benchmark results
36
 
37
- [...]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  ## Usage
40
 
41
- [...]
 
 
 
 
 
 
 
42
 
43
- ### pool
44
 
45
- [Install instructions...]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  ### Local deployment
48
 
@@ -56,21 +132,134 @@ Thanks to support from Ollama and the mlx-lm team...
56
 
57
  [...]
58
 
59
- ## Controlling reasoning
60
 
61
  [...]
62
 
63
- ## Tool calling
64
 
65
- [...]
66
 
67
- ## Sampling parameters
68
 
69
- [...]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
  ## License
72
 
73
  This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
74
 
75
- [Some wording on acceptable use guidance; Mistral uses "You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights."]
76
-
 
4
  base_model:
5
  - poolside/Laguna-XS.2-base
6
  extra_gated_description: >-
7
+ To learn more about how we process your personal data, please read our <a
8
+ href="https://poolside.ai/privacy">Privacy Policy</a>.
9
  tags:
10
  - laguna-xs.2
11
  license: apache-2.0
12
+ pipeline_tag: text-generation
13
  ---
14
 
15
+ <p align="center">
16
+ <img alt="poolside-banner" src="">
17
+ </p>
18
+
19
+ <p align="center">
20
+ <a href="https://shimmer.poolside.ai"><strong>Try Laguna XS.2 in Shimmer</strong></a> ·
21
+ <a href="https://platform.poolside.ai"><strong>Get an API key</strong></a> ·
22
+ <a href=""><strong>Release blog post</strong></a>
23
+ </p>
24
+
25
+ <br>
26
+
27
  # Laguna XS.2
28
+ Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token designed for agentic coding and long-horizon work on a local machine. It uses Sliding Window Attention with per-head gating in 30 out of 40 layers for fast inference and low KV cache requirements.
29
 
30
+ > [!NOTE]
31
+ > This is the instruct model with native reasoning support and interleaved thinking. For the base model, see [Laguna XS.2-base](https://huggingface.co/poolside/Laguna-XS.2-base).
32
 
33
+ For more details on how we trained this model, including on data automixing and async off-policy agent RL, check out our [release blog post]().
34
 
35
+ ## Highlights
36
+ - **Mixed SWA and global attention layout**: Laguna XS.2 uses sigmoid gating with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers in a 3:1 ratio (across 40 total layers)
37
+ - **KV cache in FP8**: All quantization formats use a KV cache quantized to FP8, reducing memory per token
38
+ - **Native reasoning support**: Interleaved thinking enabled by default
39
+ - **Local-ready**: At 33B total parameters and 3B activated, Laguna XS.2 is compact enough to run on a Mac with 36 GB of RAM. [Available on Ollama](https://ollama.com/library/laguna-xs.2)
40
+ - **Apache 2.0 license**: Use and modify freely for commerical and non-commercial purposes
41
+
42
+ ---
43
 
44
  ## Model overview
45
 
46
  - Training: pre-training, post-training and reinforcement learning stages (instruct)
47
+ - Number of parameters: 33B total with 3B activated per token
48
+ - Optimizer: Muon
49
+ - Layers: 40 layers (10 layers with global attention, 30 layers with sliding window attention)
50
+ - Experts: 256 experts with 1 shared expert
51
+ - Sliding Window: 512 tokens
52
+ - Modality: text-to-text
53
  - Context window: 131,072 tokens
54
  - Reasoning support: thinking default enabled; interleaved thinking with preserved thinking supported
55
 
56
  ## Benchmark results
57
 
58
+ [Placeholder for chart SVG]
59
+
60
+ We evaluate Laguna XS.2 with thinking enabled in our agent harness, pool (see the Usage section below to download and run locally), across all benchmarks. For other models, we use the best available publicly-reported score; if not available, we calculate baselines using OpenHands (SWE-bench family) or Terminus 2 (Terminal-Bench 2.0) using the settings below.
61
+
62
+ | Model | Size (total params.) | SWE-bench Pro | SWE-bench Verified | SWE-bench Multilingual | Terminal-Bench 2.0 |
63
+ |---------------------------|----------------------|---------------|--------------------|------------------------|--------------------|
64
+ | **Laguna XS.2** | 33B | xx.x% | xx.x% | xx.x% | xx.x% |
65
+ | Nemotron 3 Nano | 30B | xx.x% | xx.x% | xx.x% | xx.x% |
66
+ | Devstral Small 2 | 24B dense | - | 68.0% | 55.7% | 22.5% |
67
+ | Gemma 4 26B A4B IT | 26B | xx.x% | xx.x% | xx.x% | xx.x% |
68
+ | Gemma 4 31B IT | 31B dense | xx.x% | xx.x% | xx.x% | xx.x% |
69
+ | Qwen3.6-35B-A3B | 35B | 49.5% | 73.4% | 67.2% | 51.5% |
70
+ | Qwen3.6-27B | 27B dense | 53.2% | 77.2% | 71.3% | 59.3% |
71
+ | GPT-5.4 Nano | - | 52.4% | - | - | 46.3% |
72
+
73
+ \* SWE-bench series: [our configuration; any fixes applied, etc., avg. of k] Nemotron 3 Nano and Gemma 4 models evaluated in OpenHands with [configuration]. Terminal-Bench 2.0: [our configuration; any fixes applied, etc.] Nemotron 3 Nano and Gemma 4 models evaluated in Terminus 2 with [configuration].
74
 
75
  ## Usage
76
 
77
+ Laguna XS.2 has launch-day support in vLLM and Transformers, and TRT-LLM and SGLang thanks to the support of the team at NVIDIA.
78
+
79
+ The fastest way to get started is with our API, directly or using OpenRouter.
80
+
81
+ > [!NOTE]
82
+ > We are providing free access for a limited time to Laguna XS.2, and our larger 225B model, Laguna M.1, on our API. You can create an API key on our [Platform](https://platform.poolside.ai).
83
+
84
+ ## pool
85
 
86
+ **pool** is a lightweight terminal-based coding agent and a dual [Agent Client Protocol](https://agentclientprotocol.com/get-started) client-server.
87
 
88
+ Download and install for macOS and Linux:
89
+
90
+ ```shell
91
+ curl -fsSL https://downloads.poolside.ai/pool/install.sh | sh
92
+ ```
93
+
94
+ Launch and *Log in with Poolside* to get a free API key.
95
+
96
+ ```shell
97
+ pool
98
+ ```
99
+
100
+ [Placeholder for screenshot]
101
+
102
+ Use in any [ACP client](https://agentclientprotocol.com/get-started/clients). Configure Zed and JetBrains automatically:
103
+
104
+ ```shell
105
+ pool acp setup --editor zed|jetbrains
106
+ ```
107
+
108
+ Use pool with Ollama with one-command setup:
109
+
110
+ ```shell
111
+ ollama pull laguna.xs-2
112
+ ollama launch pool --model laguna.xs-2
113
+ ```
114
+
115
+ (requires Ollama 0.20.8 or later)
116
+
117
+ ### Feedback and issues
118
+
119
+ Submit feedback with `/feedback` and read the [full documentation on GitHub](https://github.com/poolsideai/pool).
120
+
121
+ *By downloading and using pool, you agree to the Poolside [End User License Agreement (EULA)](https://poolside.ai/legal/eula).*
122
 
123
  ### Local deployment
124
 
 
132
 
133
  [...]
134
 
135
+ #### Transformers
136
 
137
  [...]
138
 
139
+ #### [Other frameworks]
140
 
141
+ ## Controlling reasoning
142
 
143
+ Laguna XS.2 has native reasoning support and is designed to work best with *preserved thinking*, where `reasoning` content from prior assistant messages is preserved in the message history. This model will generally reason before calling tools and between tool calls.
144
 
145
+ <details>
146
+ <summary>Expand for example</summary>
147
+
148
+ ```python
149
+ import json
150
+ from openai import OpenAI
151
+
152
+ client = OpenAI(
153
+ base_url="https://inference.poolside.ai/v1",
154
+ api_key="...",
155
+ )
156
+
157
+ model = "poolside/laguna-xs.2"
158
+
159
+ tools = [{"type": "function", "function": {
160
+ "name": "shell",
161
+ "description": "Execute a bash command and return the output.",
162
+ "parameters": {"type": "object", "properties": {"cmd": {"type": "string"}}, "required": ["cmd"]},
163
+ }}]
164
+
165
+ messages = [
166
+ {"role": "system", "content": "You are a coding agent with access to a shell tool."},
167
+ {"role": "user", "content": "Run uname -a"},
168
+ ]
169
+
170
+ # Thinking is enabled by default when the server sets --default-chat-template-kwargs {"enable_thinking": True}
171
+ # When using the Poolside API (https://inference.poolside.ai/v1), this flag is set by default
172
+ response = client.chat.completions.create(
173
+ model=model,
174
+ messages=messages,
175
+ tools=tools,
176
+ stream=True,
177
+ )
178
+
179
+ reasoning, content, tool_calls = "", "", []
180
+ for chunk in response:
181
+ delta = chunk.choices[0].delta
182
+ if hasattr(delta, "reasoning") and delta.reasoning:
183
+ reasoning += delta.reasoning
184
+ if hasattr(delta, "content") and delta.content:
185
+ content += delta.content
186
+ if hasattr(delta, "tool_calls") and delta.tool_calls:
187
+ for tc in delta.tool_calls:
188
+ if tc.index >= len(tool_calls):
189
+ tool_calls.append({"id": tc.id, "function": {"name": "", "arguments": ""}})
190
+ if tc.function.name:
191
+ tool_calls[tc.index]["function"]["name"] = tc.function.name
192
+ if tc.function.arguments:
193
+ tool_calls[tc.index]["function"]["arguments"] += tc.function.arguments
194
+
195
+ print(f"Reasoning: {reasoning}\nContent: {content}\nTool calls: {tool_calls}\n")
196
+
197
+ # Return reasoning in the next request for best performance
198
+ messages.append({
199
+ "role": "assistant",
200
+ "content": content,
201
+ "reasoning": reasoning,
202
+ "tool_calls": [{"id": tc["id"], "type": "function", "function": tc["function"]} for tc in tool_calls]
203
+ })
204
+
205
+ messages.append({
206
+ "role": "tool",
207
+ "tool_call_id": tool_calls[0]["id"],
208
+ "content": json.dumps({"stdout": "Darwin arm64", "exit_code": "0"})
209
+ })
210
+
211
+ response = client.chat.completions.create(
212
+ model=model,
213
+ messages=messages,
214
+ tools=tools,
215
+ stream=True,
216
+ )
217
+
218
+ reasoning, content = "", ""
219
+ for chunk in response:
220
+ delta = chunk.choices[0].delta
221
+ if hasattr(delta, "reasoning_content") and delta.reasoning_content:
222
+ reasoning += delta.reasoning_content
223
+ if hasattr(delta, "content") and delta.content:
224
+ content += delta.content
225
+
226
+ print(f"Reasoning: {reasoning}\nContent: {content}")
227
+ ```
228
+
229
+ </details>
230
+
231
+ ### Disabling reasoning
232
+
233
+ You can disable thinking by setting `enable_thinking` to `False` in a request or by not providing `--default-chat-template-kwargs {"enable_thinking": True}` or equivalent when starting the server.
234
+
235
+ <details>
236
+ <summary>Expand for example</summary>
237
+
238
+ ```python
239
+ from openai import OpenAI
240
+ client = OpenAI()
241
+
242
+ completion = client.chat.completions.create(
243
+ model="poolside/laguna-xs.2",
244
+ messages=[
245
+ {"role": "user", "content": "Write a retry wrapper with exponential backoff."}
246
+ ],
247
+ extra_body={
248
+ "chat_template_kwargs": { "enable_thinking": False },
249
+ }
250
+ stream=True
251
+ )
252
+
253
+ for chunk in completion:
254
+ print(chunk.choices[0].delta)
255
+ ```
256
+
257
+ </details>
258
+
259
+ For agentic coding use cases, we recommend enabling thinking and preserving reasoning in message history as outlined in the [Controlling reasoning] section.
260
 
261
  ## License
262
 
263
  This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
264
 
265
+ You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.