Update README.md
Browse files
README.md
CHANGED
|
@@ -1,230 +1,76 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
library_name: vllm
|
| 4 |
inference: false
|
|
|
|
|
|
|
| 5 |
extra_gated_description: >-
|
| 6 |
-
To learn more about how we process your personal data, please read
|
| 7 |
-
href="https://poolside.ai/privacy">Privacy Policy</a>.
|
| 8 |
tags:
|
| 9 |
- laguna-xs.2
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
# Laguna XS.2
|
| 13 |
-
Laguna XS.2 is
|
| 14 |
|
| 15 |
-
This is the
|
| 16 |
|
| 17 |
-
For more details
|
| 18 |
|
| 19 |
## Key features
|
| 20 |
-
- **
|
| 21 |
-
- **
|
| 22 |
-
- **
|
| 23 |
-
- **Local-ready**: At 33B total parameters and 3B activated, Laguna XS.2 is compact enough to run on a Mac with 36 GB of RAM. [Available on Ollama](https://ollama.com/library/laguna-xs.2)
|
| 24 |
-
- **Apache 2.0 license**: Use and modify freely for commerical and non-commercial purposes
|
| 25 |
|
| 26 |
## Model overview
|
| 27 |
|
| 28 |
- Training: pre-training, post-training and reinforcement learning stages (instruct)
|
| 29 |
-
- Number of parameters: 33B total with 3B activated
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
-
- Experts: 256 experts with 1 shared expert
|
| 33 |
-
- Sliding Window: 512 tokens
|
| 34 |
-
- Modality: text-to-text
|
| 35 |
- Context window: 131,072 tokens
|
| 36 |
- Reasoning support: thinking default enabled; interleaved thinking with preserved thinking supported
|
| 37 |
|
| 38 |
-
## Benchmark results
|
| 39 |
-
|
| 40 |
-
We evaluate Laguna XS.2 with thinking enabled in our agent harness, pool (see the Usage section below to download and run locally), across all benchmarks. For other models, we use the best available publicly-reported score; if not available, we calculate baselines using OpenHands (SWE-bench family) or Terminus 2 (Terminal-Bench 2.0) using the settings below.
|
| 41 |
-
|
| 42 |
-
| Model | Size (total params.) | SWE-bench Pro | SWE-bench Verified | SWE-bench Multilingual | Terminal-Bench 2.0 |
|
| 43 |
-
|---------------------------|----------------------|---------------|--------------------|------------------------|--------------------|
|
| 44 |
-
| **Laguna XS.2** | 33B | xx.x% | xx.x% | xx.x% | xx.x% |
|
| 45 |
-
| Nemotron 3 Nano | 30B | xx.x% | xx.x% | xx.x% | xx.x% |
|
| 46 |
-
| Devstral Small 2 | 24B dense | - | 68.0% | 55.7% | 22.5% |
|
| 47 |
-
| Gemma 4 26B A4B IT | 26B | xx.x% | xx.x% | xx.x% | xx.x% |
|
| 48 |
-
| Gemma 4 31B IT | 31B dense | xx.x% | xx.x% | xx.x% | xx.x% |
|
| 49 |
-
| Qwen3.6-35B-A3B | 35B | 49.5% | 73.4% | 67.2% | 51.5% |
|
| 50 |
-
| Qwen3.6-27B | 27B dense | 53.2% | 77.2% | 71.3% | 59.3% |
|
| 51 |
-
| GPT-5.4 Nano | - | 52.4% | - | - | 46.3% |
|
| 52 |
|
| 53 |
-
|
| 54 |
|
| 55 |
## Usage
|
| 56 |
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
The fastest way to get started is with our API, directly or using OpenRouter, free for a limited time.
|
| 60 |
-
|
| 61 |
-
## pool
|
| 62 |
-
|
| 63 |
-
**pool** is a lightweight terminal-based coding agent and a dual [Agent Client Protocol](https://agentclientprotocol.com/get-started) client-server.
|
| 64 |
-
|
| 65 |
-
Download and install for macOS and Linux:
|
| 66 |
-
|
| 67 |
-
```
|
| 68 |
-
curl -fsSL https://downloads.poolside.ai/pool/install.sh | sh
|
| 69 |
-
```
|
| 70 |
-
|
| 71 |
-
Launch and *Log in with Poolside* to get a free API key.
|
| 72 |
-
|
| 73 |
-
```
|
| 74 |
-
pool
|
| 75 |
-
```
|
| 76 |
-
|
| 77 |
-
[Placeholder for screenshot]
|
| 78 |
-
|
| 79 |
-
Use in any [ACP client](https://agentclientprotocol.com/get-started/clients). Configure Zed and JetBrains automatically:
|
| 80 |
-
|
| 81 |
-
```
|
| 82 |
-
pool acp setup --editor zed|jetbrains
|
| 83 |
-
```
|
| 84 |
-
|
| 85 |
-
Use pool with Ollama with one-command setup:
|
| 86 |
-
|
| 87 |
-
```
|
| 88 |
-
ollama pull laguna.xs-2
|
| 89 |
-
|
| 90 |
-
ollama launch pool --model laguna.xs-2
|
| 91 |
-
```
|
| 92 |
|
| 93 |
-
|
| 94 |
|
| 95 |
-
|
| 96 |
|
| 97 |
-
|
| 98 |
|
| 99 |
-
|
| 100 |
|
| 101 |
-
|
| 102 |
-
Laguna-XS.2 is supported on vLLM, Transformers v5, TRT-LLM, Ollama & mlx-lm. We would like to thank the teams at NVIDIA, Ollama and Newya Labs.
|
| 103 |
|
| 104 |
[Device frameworks: Ollama, mlx-lm, ...]
|
| 105 |
|
| 106 |
-
#### vLLM
|
| 107 |
-
|
| 108 |
|
|
|
|
| 109 |
|
| 110 |
-
##
|
| 111 |
|
| 112 |
[...]
|
| 113 |
|
| 114 |
-
##
|
| 115 |
|
| 116 |
-
|
| 117 |
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
import json
|
| 122 |
-
from openai import OpenAI
|
| 123 |
-
|
| 124 |
-
client = OpenAI(
|
| 125 |
-
base_url="https://inference.poolside.ai/v1",
|
| 126 |
-
api_key="...",
|
| 127 |
-
)
|
| 128 |
-
|
| 129 |
-
model = "poolside/laguna-xs.2"
|
| 130 |
-
|
| 131 |
-
tools = [{"type": "function", "function": {
|
| 132 |
-
"name": "shell",
|
| 133 |
-
"description": "Execute a bash command and return the output.",
|
| 134 |
-
"parameters": {"type": "object", "properties": {"cmd": {"type": "string"}}, "required": ["cmd"]},
|
| 135 |
-
}}]
|
| 136 |
-
|
| 137 |
-
messages = [
|
| 138 |
-
{"role": "system", "content": "You are a coding agent with access to a shell tool."},
|
| 139 |
-
{"role": "user", "content": "Run uname -a"},
|
| 140 |
-
]
|
| 141 |
-
|
| 142 |
-
# Thinking is enabled by default when the server sets --default-chat-template-kwargs {"enable_thinking": True}
|
| 143 |
-
# When using the Poolside API (https://inference.poolside.ai/v1), this flag is set by default
|
| 144 |
-
response = client.chat.completions.create(
|
| 145 |
-
model=model,
|
| 146 |
-
messages=messages,
|
| 147 |
-
tools=tools,
|
| 148 |
-
stream=True,
|
| 149 |
-
)
|
| 150 |
-
|
| 151 |
-
reasoning, content, tool_calls = "", "", []
|
| 152 |
-
for chunk in response:
|
| 153 |
-
delta = chunk.choices[0].delta
|
| 154 |
-
if hasattr(delta, "reasoning") and delta.reasoning:
|
| 155 |
-
reasoning += delta.reasoning
|
| 156 |
-
if hasattr(delta, "content") and delta.content:
|
| 157 |
-
content += delta.content
|
| 158 |
-
if hasattr(delta, "tool_calls") and delta.tool_calls:
|
| 159 |
-
for tc in delta.tool_calls:
|
| 160 |
-
if tc.index >= len(tool_calls):
|
| 161 |
-
tool_calls.append({"id": tc.id, "function": {"name": "", "arguments": ""}})
|
| 162 |
-
if tc.function.name:
|
| 163 |
-
tool_calls[tc.index]["function"]["name"] = tc.function.name
|
| 164 |
-
if tc.function.arguments:
|
| 165 |
-
tool_calls[tc.index]["function"]["arguments"] += tc.function.arguments
|
| 166 |
-
|
| 167 |
-
print(f"Reasoning: {reasoning}\nContent: {content}\nTool calls: {tool_calls}\n")
|
| 168 |
-
|
| 169 |
-
# Return reasoning in the next request for best performance
|
| 170 |
-
messages.append({
|
| 171 |
-
"role": "assistant",
|
| 172 |
-
"content": content,
|
| 173 |
-
"reasoning": reasoning,
|
| 174 |
-
"tool_calls": [{"id": tc["id"], "type": "function", "function": tc["function"]} for tc in tool_calls]
|
| 175 |
-
})
|
| 176 |
-
|
| 177 |
-
messages.append({
|
| 178 |
-
"role": "tool",
|
| 179 |
-
"tool_call_id": tool_calls[0]["id"],
|
| 180 |
-
"content": json.dumps({"stdout": "Darwin arm64", "exit_code": "0"})
|
| 181 |
-
})
|
| 182 |
-
|
| 183 |
-
response = client.chat.completions.create(
|
| 184 |
-
model=model,
|
| 185 |
-
messages=messages,
|
| 186 |
-
tools=tools,
|
| 187 |
-
stream=True,
|
| 188 |
-
)
|
| 189 |
-
|
| 190 |
-
reasoning, content = "", ""
|
| 191 |
-
for chunk in response:
|
| 192 |
-
delta = chunk.choices[0].delta
|
| 193 |
-
if hasattr(delta, "reasoning_content") and delta.reasoning_content:
|
| 194 |
-
reasoning += delta.reasoning_content
|
| 195 |
-
if hasattr(delta, "content") and delta.content:
|
| 196 |
-
content += delta.content
|
| 197 |
-
|
| 198 |
-
print(f"Reasoning: {reasoning}\nContent: {content}")
|
| 199 |
-
```
|
| 200 |
-
|
| 201 |
-
### Disabling reasoning
|
| 202 |
-
|
| 203 |
-
You can disable thinking by setting `enable_thinking` to `False` in a request or by not providing `--default-chat-template-kwargs {"enable_thinking": true}` or equivalent when starting the server.
|
| 204 |
-
|
| 205 |
-
```python
|
| 206 |
-
from openai import OpenAI
|
| 207 |
-
client = OpenAI()
|
| 208 |
-
|
| 209 |
-
completion = client.chat.completions.create(
|
| 210 |
-
model="poolside/laguna-xs.2",
|
| 211 |
-
messages=[
|
| 212 |
-
{"role": "user", "content": "Write a retry wrapper with exponential backoff."}
|
| 213 |
-
],
|
| 214 |
-
extra_body={
|
| 215 |
-
"chat_template_kwargs": { "enable_thinking": False },
|
| 216 |
-
}
|
| 217 |
-
stream=True
|
| 218 |
-
)
|
| 219 |
-
|
| 220 |
-
for chunk in completion:
|
| 221 |
-
print(chunk.choices[0].delta)
|
| 222 |
-
```
|
| 223 |
-
|
| 224 |
-
For agentic coding use cases, we recommend enabling thinking and preserving reasoning in message history as outlined in the [Controlling reasoning] section.
|
| 225 |
|
| 226 |
## License
|
| 227 |
|
| 228 |
This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
|
| 229 |
|
| 230 |
-
[
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
library_name: vllm
|
| 3 |
inference: false
|
| 4 |
+
base_model:
|
| 5 |
+
- poolside/Laguna-XS.2-base
|
| 6 |
extra_gated_description: >-
|
| 7 |
+
To learn more about how we process your personal data, please read
|
| 8 |
+
our <a href="https://poolside.ai/privacy">Privacy Policy</a>.
|
| 9 |
tags:
|
| 10 |
- laguna-xs.2
|
| 11 |
+
license: apache-2.0
|
| 12 |
---
|
| 13 |
|
| 14 |
# Laguna XS.2
|
| 15 |
+
Laguna XS.2 is an agentic coding model designed for software engineering use cases and tool calling capabilities.
|
| 16 |
|
| 17 |
+
This is the base model. For post-trained variants, please see the other models in the collection.
|
| 18 |
|
| 19 |
+
For more details, check out our [release blog post]().
|
| 20 |
|
| 21 |
## Key features
|
| 22 |
+
- **Mixture of Experts architecture with sigmoid gating**: Laguna XS.2 uses a sigmoid scoring function with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers.
|
| 23 |
+
- **Reasoning control**: [Enable or disable thinking per-request].
|
| 24 |
+
- **Apache-2.0 license**: Use and modify freely for commerical and non-commercial purposes.
|
|
|
|
|
|
|
| 25 |
|
| 26 |
## Model overview
|
| 27 |
|
| 28 |
- Training: pre-training, post-training and reinforcement learning stages (instruct)
|
| 29 |
+
- Number of parameters: 33B total with 3B activated
|
| 30 |
+
- Layers:
|
| 31 |
+
- Experts:
|
|
|
|
|
|
|
|
|
|
| 32 |
- Context window: 131,072 tokens
|
| 33 |
- Reasoning support: thinking default enabled; interleaved thinking with preserved thinking supported
|
| 34 |
|
| 35 |
+
## Benchmark results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
+
[...]
|
| 38 |
|
| 39 |
## Usage
|
| 40 |
|
| 41 |
+
[...]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
### pool
|
| 44 |
|
| 45 |
+
[Install instructions...]
|
| 46 |
|
| 47 |
+
### Local deployment
|
| 48 |
|
| 49 |
+
[vLLM, Transformers v5, TRT-LLM, SGLang, ...]
|
| 50 |
|
| 51 |
+
Thanks to support from Ollama and the mlx-lm team...
|
|
|
|
| 52 |
|
| 53 |
[Device frameworks: Ollama, mlx-lm, ...]
|
| 54 |
|
| 55 |
+
#### vLLM
|
|
|
|
| 56 |
|
| 57 |
+
[...]
|
| 58 |
|
| 59 |
+
## Controlling reasoning
|
| 60 |
|
| 61 |
[...]
|
| 62 |
|
| 63 |
+
## Tool calling
|
| 64 |
|
| 65 |
+
[...]
|
| 66 |
|
| 67 |
+
## Sampling parameters
|
| 68 |
+
|
| 69 |
+
[...]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
## License
|
| 72 |
|
| 73 |
This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
|
| 74 |
|
| 75 |
+
[Some wording on acceptable use guidance; Mistral uses "You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights."]
|
| 76 |
+
|