varunrandery commited on
Commit
aca5518
·
verified ·
1 Parent(s): 2747fba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -21
README.md CHANGED
@@ -19,7 +19,7 @@ pipeline_tag: text-generation
19
  <p align="center">
20
  <a href="https://shimmer.poolside.ai"><strong>Try Laguna XS.2 in Shimmer</strong></a> ·
21
  <a href="https://platform.poolside.ai"><strong>Get an API key</strong></a> ·
22
- <a href=""><strong>Release blog post</strong></a>
23
  </p>
24
 
25
  <br>
@@ -30,7 +30,7 @@ Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated
30
  > [!NOTE]
31
  > This is the instruct model with native reasoning support and interleaved thinking. For the base model, see [Laguna XS.2-base](https://huggingface.co/poolside/Laguna-XS.2-base).
32
 
33
- For more details on how we trained this model, including on data automixing and async off-policy agent RL, check out our [release blog post]().
34
 
35
  ## Highlights
36
  - **Mixed SWA and global attention layout**: Laguna XS.2 uses sigmoid gating with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers in a 3:1 ratio (across 40 total layers)
@@ -59,18 +59,28 @@ For more details on how we trained this model, including on data automixing and
59
 
60
  We evaluate Laguna XS.2 with thinking enabled in our agent harness, pool (see the Usage section below to download and run locally), across all benchmarks. For other models, we use the best available publicly-reported score; if not available, we calculate baselines using OpenHands (SWE-bench family) or Terminus 2 (Terminal-Bench 2.0) using the settings below.
61
 
62
- | Model | Size (total params.) | SWE-bench Pro | SWE-bench Verified | SWE-bench Multilingual | Terminal-Bench 2.0 |
63
- |---------------------------|----------------------|---------------|--------------------|------------------------|--------------------|
64
- | **Laguna XS.2** | 33B | xx.x% | xx.x% | xx.x% | xx.x% |
65
- | Nemotron 3 Nano | 30B | xx.x% | xx.x% | xx.x% | xx.x% |
66
- | Devstral Small 2 | 24B dense | - | 68.0% | 55.7% | 22.5% |
67
- | Gemma 4 26B A4B IT | 26B | xx.x% | xx.x% | xx.x% | xx.x% |
68
- | Gemma 4 31B IT | 31B dense | xx.x% | xx.x% | xx.x% | xx.x% |
69
- | Qwen3.6-35B-A3B | 35B | 49.5% | 73.4% | 67.2% | 51.5% |
70
- | Qwen3.6-27B | 27B dense | 53.2% | 77.2% | 71.3% | 59.3% |
71
- | GPT-5.4 Nano | - | 52.4% | - | - | 46.3% |
72
 
73
- \* SWE-bench series: [our configuration; any fixes applied, etc., avg. of k] Nemotron 3 Nano and Gemma 4 models evaluated in OpenHands with [configuration]. Terminal-Bench 2.0: [our configuration; any fixes applied, etc.] Nemotron 3 Nano and Gemma 4 models evaluated in Terminus 2 with [configuration].
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
  ## Usage
76
 
@@ -81,14 +91,14 @@ The fastest way to get started is with our API, directly or using OpenRouter.
81
  > [!NOTE]
82
  > We are providing free access for a limited time to Laguna XS.2, and our larger 225B model, Laguna M.1, on our API. You can create an API key on our [Platform](https://platform.poolside.ai).
83
 
84
- ## pool
85
 
86
  **pool** is a lightweight terminal-based coding agent and a dual [Agent Client Protocol](https://agentclientprotocol.com/get-started) client-server.
87
 
88
  Download and install for macOS and Linux:
89
 
90
  ```shell
91
- curl -fsSL https://downloads.poolside.ai/pool/install.sh | sh
92
  ```
93
 
94
  Launch and *Log in with Poolside* to get a free API key.
@@ -114,13 +124,13 @@ ollama launch pool --model laguna.xs-2
114
 
115
  (requires Ollama 0.20.8 or later)
116
 
117
- ### Feedback and issues
118
 
119
  Submit feedback with `/feedback` and read the [full documentation on GitHub](https://github.com/poolsideai/pool).
120
 
121
  *By downloading and using pool, you agree to the Poolside [End User License Agreement (EULA)](https://poolside.ai/legal/eula).*
122
 
123
- ## Local deployment
124
 
125
  [vLLM, Transformers v5, TRT-LLM, SGLang, ...]
126
 
@@ -128,15 +138,15 @@ Thanks to support from Ollama and the mlx-lm team...
128
 
129
  [Device frameworks: Ollama, mlx-lm, ...]
130
 
131
- ### vLLM
132
 
133
  [...]
134
 
135
- ### Transformers
136
 
137
  [...]
138
 
139
- ### [Other frameworks]
140
 
141
  ## Controlling reasoning
142
 
@@ -246,7 +256,7 @@ completion = client.chat.completions.create(
246
  ],
247
  extra_body={
248
  "chat_template_kwargs": { "enable_thinking": False },
249
- }
250
  stream=True
251
  )
252
 
 
19
  <p align="center">
20
  <a href="https://shimmer.poolside.ai"><strong>Try Laguna XS.2 in Shimmer</strong></a> ·
21
  <a href="https://platform.poolside.ai"><strong>Get an API key</strong></a> ·
22
+ <a href="https://poolside.ai/blog/laguna-a-deeper-dive"><strong>Release blog post</strong></a>
23
  </p>
24
 
25
  <br>
 
30
  > [!NOTE]
31
  > This is the instruct model with native reasoning support and interleaved thinking. For the base model, see [Laguna XS.2-base](https://huggingface.co/poolside/Laguna-XS.2-base).
32
 
33
+ For more details on how we trained this model, including on data automixing and async off-policy agent RL, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive).
34
 
35
  ## Highlights
36
  - **Mixed SWA and global attention layout**: Laguna XS.2 uses sigmoid gating with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers in a 3:1 ratio (across 40 total layers)
 
59
 
60
  We evaluate Laguna XS.2 with thinking enabled in our agent harness, pool (see the Usage section below to download and run locally), across all benchmarks. For other models, we use the best available publicly-reported score; if not available, we calculate baselines using OpenHands (SWE-bench family) or Terminus 2 (Terminal-Bench 2.0) using the settings below.
61
 
62
+ | Model | Size (total params.) | SWE-bench Pro (Public Dataset) | SWE-bench Verified | SWE-bench Multilingual | Terminal-Bench 2.0 |
63
+ |---------------------------|----------------------|--------------------------------|--------------------|------------------------|--------------------|
64
+ | **Laguna XS.2** | 33B | 44.5% | 68.2% | 62.4% | 30.1% |
65
+ | Devstral Small 2 | 24B dense | - | 68.0% | 55.7% | 22.5% |
66
+ | Gemma 4 31B IT | 31B dense | 35.7% | 52.0% | 51.7% | 42.9% |
67
+ | Qwen3.5-35B-A3B | 35B | 44.6% | 69.2% | 60.3% | 40.5% |
68
+ | GPT-5.4 Nano | - | 52.4% | - | - | 46.3% |
69
+ | Qwen3.6-27B | 27B dense | 53.2% | 77.2% | 71.3% | 59.3% |
 
 
70
 
71
+ *We used the highest publicly-referenced scores for all comparison models across each benchmark. In all cases these were official scores published in release blog posts or equivalent, with the exception of Gemma 4 31B IT where the highest published scores were [reported by the Qwen team](https://qwen.ai/blog?id=qwen3.6-35b-a3b).*
72
+
73
+ <details>
74
+ <summary>Expand for benchmarking methodology</summary>
75
+
76
+ All benchmarking for Laguna XS.2 was completed using the Laude Institute’s Harbor Framework with our [agent harness](https://github.com/poolsideai/pool), using a maximum of 500 steps and sandboxed execution using 8 GB RAM/2 CPUs (with the exception of Terminal-Bench 2.0; see below). The same sampling parameters were used for all benchmarking: temperature=0.7 and top_k=20. Some base task images and verifiers were patched to fix infrastructure reliability issues inherent in task setup, such as rate limits on third-party dependencies in external registries used by the verifier. More details outlining these updates and other findings will follow in a future technical blog post.
77
+
78
+ - SWE-bench Pro: mean pass@1 averaged over 3 runs.
79
+ - SWE-bench Verified: mean pass@1 averaged over 4 runs.
80
+ - SWE-bench Multilingual: mean pass@1 averaged over 7 runs.
81
+ - Terminal-Bench 2.0: mean pass@1 averaged over 5 runs. 48GB RAM/32 CPUs.
82
+
83
+ </details>
84
 
85
  ## Usage
86
 
 
91
  > [!NOTE]
92
  > We are providing free access for a limited time to Laguna XS.2, and our larger 225B model, Laguna M.1, on our API. You can create an API key on our [Platform](https://platform.poolside.ai).
93
 
94
+ ### pool
95
 
96
  **pool** is a lightweight terminal-based coding agent and a dual [Agent Client Protocol](https://agentclientprotocol.com/get-started) client-server.
97
 
98
  Download and install for macOS and Linux:
99
 
100
  ```shell
101
+ curl -fsSL https://downloads.poolside.ai/pool/install.sh | bash
102
  ```
103
 
104
  Launch and *Log in with Poolside* to get a free API key.
 
124
 
125
  (requires Ollama 0.20.8 or later)
126
 
127
+ #### Feedback and issues
128
 
129
  Submit feedback with `/feedback` and read the [full documentation on GitHub](https://github.com/poolsideai/pool).
130
 
131
  *By downloading and using pool, you agree to the Poolside [End User License Agreement (EULA)](https://poolside.ai/legal/eula).*
132
 
133
+ ### Local deployment
134
 
135
  [vLLM, Transformers v5, TRT-LLM, SGLang, ...]
136
 
 
138
 
139
  [Device frameworks: Ollama, mlx-lm, ...]
140
 
141
+ #### vLLM
142
 
143
  [...]
144
 
145
+ #### Transformers
146
 
147
  [...]
148
 
149
+ #### [Other frameworks]
150
 
151
  ## Controlling reasoning
152
 
 
256
  ],
257
  extra_body={
258
  "chat_template_kwargs": { "enable_thinking": False },
259
+ },
260
  stream=True
261
  )
262