Update README.md
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ pipeline_tag: text-generation
|
|
| 19 |
<p align="center">
|
| 20 |
<a href="https://shimmer.poolside.ai"><strong>Try Laguna XS.2 in Shimmer</strong></a> ·
|
| 21 |
<a href="https://platform.poolside.ai"><strong>Get an API key</strong></a> ·
|
| 22 |
-
<a href=""><strong>Release blog post</strong></a>
|
| 23 |
</p>
|
| 24 |
|
| 25 |
<br>
|
|
@@ -30,7 +30,7 @@ Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated
|
|
| 30 |
> [!NOTE]
|
| 31 |
> This is the instruct model with native reasoning support and interleaved thinking. For the base model, see [Laguna XS.2-base](https://huggingface.co/poolside/Laguna-XS.2-base).
|
| 32 |
|
| 33 |
-
For more details on how we trained this model, including on data automixing and async off-policy agent RL, check out our [release blog post]().
|
| 34 |
|
| 35 |
## Highlights
|
| 36 |
- **Mixed SWA and global attention layout**: Laguna XS.2 uses sigmoid gating with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers in a 3:1 ratio (across 40 total layers)
|
|
@@ -59,18 +59,28 @@ For more details on how we trained this model, including on data automixing and
|
|
| 59 |
|
| 60 |
We evaluate Laguna XS.2 with thinking enabled in our agent harness, pool (see the Usage section below to download and run locally), across all benchmarks. For other models, we use the best available publicly-reported score; if not available, we calculate baselines using OpenHands (SWE-bench family) or Terminus 2 (Terminal-Bench 2.0) using the settings below.
|
| 61 |
|
| 62 |
-
| Model | Size (total params.) | SWE-bench Pro | SWE-bench Verified | SWE-bench Multilingual | Terminal-Bench 2.0 |
|
| 63 |
-
|---------------------------|----------------------|---------------|--------------------|------------------------|--------------------|
|
| 64 |
-
| **Laguna XS.2** | 33B |
|
| 65 |
-
|
|
| 66 |
-
|
|
| 67 |
-
|
|
| 68 |
-
|
|
| 69 |
-
| Qwen3.6-
|
| 70 |
-
| Qwen3.6-27B | 27B dense | 53.2% | 77.2% | 71.3% | 59.3% |
|
| 71 |
-
| GPT-5.4 Nano | - | 52.4% | - | - | 46.3% |
|
| 72 |
|
| 73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
## Usage
|
| 76 |
|
|
@@ -81,14 +91,14 @@ The fastest way to get started is with our API, directly or using OpenRouter.
|
|
| 81 |
> [!NOTE]
|
| 82 |
> We are providing free access for a limited time to Laguna XS.2, and our larger 225B model, Laguna M.1, on our API. You can create an API key on our [Platform](https://platform.poolside.ai).
|
| 83 |
|
| 84 |
-
## pool
|
| 85 |
|
| 86 |
**pool** is a lightweight terminal-based coding agent and a dual [Agent Client Protocol](https://agentclientprotocol.com/get-started) client-server.
|
| 87 |
|
| 88 |
Download and install for macOS and Linux:
|
| 89 |
|
| 90 |
```shell
|
| 91 |
-
curl -fsSL https://downloads.poolside.ai/pool/install.sh |
|
| 92 |
```
|
| 93 |
|
| 94 |
Launch and *Log in with Poolside* to get a free API key.
|
|
@@ -114,13 +124,13 @@ ollama launch pool --model laguna.xs-2
|
|
| 114 |
|
| 115 |
(requires Ollama 0.20.8 or later)
|
| 116 |
|
| 117 |
-
### Feedback and issues
|
| 118 |
|
| 119 |
Submit feedback with `/feedback` and read the [full documentation on GitHub](https://github.com/poolsideai/pool).
|
| 120 |
|
| 121 |
*By downloading and using pool, you agree to the Poolside [End User License Agreement (EULA)](https://poolside.ai/legal/eula).*
|
| 122 |
|
| 123 |
-
## Local deployment
|
| 124 |
|
| 125 |
[vLLM, Transformers v5, TRT-LLM, SGLang, ...]
|
| 126 |
|
|
@@ -128,15 +138,15 @@ Thanks to support from Ollama and the mlx-lm team...
|
|
| 128 |
|
| 129 |
[Device frameworks: Ollama, mlx-lm, ...]
|
| 130 |
|
| 131 |
-
### vLLM
|
| 132 |
|
| 133 |
[...]
|
| 134 |
|
| 135 |
-
### Transformers
|
| 136 |
|
| 137 |
[...]
|
| 138 |
|
| 139 |
-
### [Other frameworks]
|
| 140 |
|
| 141 |
## Controlling reasoning
|
| 142 |
|
|
@@ -246,7 +256,7 @@ completion = client.chat.completions.create(
|
|
| 246 |
],
|
| 247 |
extra_body={
|
| 248 |
"chat_template_kwargs": { "enable_thinking": False },
|
| 249 |
-
}
|
| 250 |
stream=True
|
| 251 |
)
|
| 252 |
|
|
|
|
| 19 |
<p align="center">
|
| 20 |
<a href="https://shimmer.poolside.ai"><strong>Try Laguna XS.2 in Shimmer</strong></a> ·
|
| 21 |
<a href="https://platform.poolside.ai"><strong>Get an API key</strong></a> ·
|
| 22 |
+
<a href="https://poolside.ai/blog/laguna-a-deeper-dive"><strong>Release blog post</strong></a>
|
| 23 |
</p>
|
| 24 |
|
| 25 |
<br>
|
|
|
|
| 30 |
> [!NOTE]
|
| 31 |
> This is the instruct model with native reasoning support and interleaved thinking. For the base model, see [Laguna XS.2-base](https://huggingface.co/poolside/Laguna-XS.2-base).
|
| 32 |
|
| 33 |
+
For more details on how we trained this model, including on data automixing and async off-policy agent RL, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive).
|
| 34 |
|
| 35 |
## Highlights
|
| 36 |
- **Mixed SWA and global attention layout**: Laguna XS.2 uses sigmoid gating with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers in a 3:1 ratio (across 40 total layers)
|
|
|
|
| 59 |
|
| 60 |
We evaluate Laguna XS.2 with thinking enabled in our agent harness, pool (see the Usage section below to download and run locally), across all benchmarks. For other models, we use the best available publicly-reported score; if not available, we calculate baselines using OpenHands (SWE-bench family) or Terminus 2 (Terminal-Bench 2.0) using the settings below.
|
| 61 |
|
| 62 |
+
| Model | Size (total params.) | SWE-bench Pro (Public Dataset) | SWE-bench Verified | SWE-bench Multilingual | Terminal-Bench 2.0 |
|
| 63 |
+
|---------------------------|----------------------|--------------------------------|--------------------|------------------------|--------------------|
|
| 64 |
+
| **Laguna XS.2** | 33B | 44.5% | 68.2% | 62.4% | 30.1% |
|
| 65 |
+
| Devstral Small 2 | 24B dense | - | 68.0% | 55.7% | 22.5% |
|
| 66 |
+
| Gemma 4 31B IT | 31B dense | 35.7% | 52.0% | 51.7% | 42.9% |
|
| 67 |
+
| Qwen3.5-35B-A3B | 35B | 44.6% | 69.2% | 60.3% | 40.5% |
|
| 68 |
+
| GPT-5.4 Nano | - | 52.4% | - | - | 46.3% |
|
| 69 |
+
| Qwen3.6-27B | 27B dense | 53.2% | 77.2% | 71.3% | 59.3% |
|
|
|
|
|
|
|
| 70 |
|
| 71 |
+
*We used the highest publicly-referenced scores for all comparison models across each benchmark. In all cases these were official scores published in release blog posts or equivalent, with the exception of Gemma 4 31B IT where the highest published scores were [reported by the Qwen team](https://qwen.ai/blog?id=qwen3.6-35b-a3b).*
|
| 72 |
+
|
| 73 |
+
<details>
|
| 74 |
+
<summary>Expand for benchmarking methodology</summary>
|
| 75 |
+
|
| 76 |
+
All benchmarking for Laguna XS.2 was completed using the Laude Institute’s Harbor Framework with our [agent harness](https://github.com/poolsideai/pool), using a maximum of 500 steps and sandboxed execution using 8 GB RAM/2 CPUs (with the exception of Terminal-Bench 2.0; see below). The same sampling parameters were used for all benchmarking: temperature=0.7 and top_k=20. Some base task images and verifiers were patched to fix infrastructure reliability issues inherent in task setup, such as rate limits on third-party dependencies in external registries used by the verifier. More details outlining these updates and other findings will follow in a future technical blog post.
|
| 77 |
+
|
| 78 |
+
- SWE-bench Pro: mean pass@1 averaged over 3 runs.
|
| 79 |
+
- SWE-bench Verified: mean pass@1 averaged over 4 runs.
|
| 80 |
+
- SWE-bench Multilingual: mean pass@1 averaged over 7 runs.
|
| 81 |
+
- Terminal-Bench 2.0: mean pass@1 averaged over 5 runs. 48GB RAM/32 CPUs.
|
| 82 |
+
|
| 83 |
+
</details>
|
| 84 |
|
| 85 |
## Usage
|
| 86 |
|
|
|
|
| 91 |
> [!NOTE]
|
| 92 |
> We are providing free access for a limited time to Laguna XS.2, and our larger 225B model, Laguna M.1, on our API. You can create an API key on our [Platform](https://platform.poolside.ai).
|
| 93 |
|
| 94 |
+
### pool
|
| 95 |
|
| 96 |
**pool** is a lightweight terminal-based coding agent and a dual [Agent Client Protocol](https://agentclientprotocol.com/get-started) client-server.
|
| 97 |
|
| 98 |
Download and install for macOS and Linux:
|
| 99 |
|
| 100 |
```shell
|
| 101 |
+
curl -fsSL https://downloads.poolside.ai/pool/install.sh | bash
|
| 102 |
```
|
| 103 |
|
| 104 |
Launch and *Log in with Poolside* to get a free API key.
|
|
|
|
| 124 |
|
| 125 |
(requires Ollama 0.20.8 or later)
|
| 126 |
|
| 127 |
+
#### Feedback and issues
|
| 128 |
|
| 129 |
Submit feedback with `/feedback` and read the [full documentation on GitHub](https://github.com/poolsideai/pool).
|
| 130 |
|
| 131 |
*By downloading and using pool, you agree to the Poolside [End User License Agreement (EULA)](https://poolside.ai/legal/eula).*
|
| 132 |
|
| 133 |
+
### Local deployment
|
| 134 |
|
| 135 |
[vLLM, Transformers v5, TRT-LLM, SGLang, ...]
|
| 136 |
|
|
|
|
| 138 |
|
| 139 |
[Device frameworks: Ollama, mlx-lm, ...]
|
| 140 |
|
| 141 |
+
#### vLLM
|
| 142 |
|
| 143 |
[...]
|
| 144 |
|
| 145 |
+
#### Transformers
|
| 146 |
|
| 147 |
[...]
|
| 148 |
|
| 149 |
+
#### [Other frameworks]
|
| 150 |
|
| 151 |
## Controlling reasoning
|
| 152 |
|
|
|
|
| 256 |
],
|
| 257 |
extra_body={
|
| 258 |
"chat_template_kwargs": { "enable_thinking": False },
|
| 259 |
+
},
|
| 260 |
stream=True
|
| 261 |
)
|
| 262 |
|