Update README.md
Browse files
README.md
CHANGED
|
@@ -1,11 +1,9 @@
|
|
| 1 |
---
|
| 2 |
library_name: vllm
|
| 3 |
inference: false
|
| 4 |
-
base_model:
|
| 5 |
-
- poolside/Laguna-XS.2-base
|
| 6 |
extra_gated_description: >-
|
| 7 |
To learn more about how we process your personal data, please read our <a
|
| 8 |
-
href="https://poolside.ai/privacy">Privacy Policy</a>.
|
| 9 |
tags:
|
| 10 |
- laguna-xs.2
|
| 11 |
license: apache-2.0
|
|
@@ -28,9 +26,7 @@ pipeline_tag: text-generation
|
|
| 28 |
Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token designed for agentic coding and long-horizon work on a local machine. It uses Sliding Window Attention with per-head gating in 30 out of 40 layers for fast inference and low KV cache requirements.
|
| 29 |
|
| 30 |
> [!NOTE]
|
| 31 |
-
>
|
| 32 |
-
|
| 33 |
-
For more details on how we trained this model, including on data automixing and async off-policy agent RL, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive).
|
| 34 |
|
| 35 |
## Highlights
|
| 36 |
- **Mixed SWA and global attention layout**: Laguna XS.2 uses sigmoid gating with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers in a 3:1 ratio (across 40 total layers)
|
|
@@ -51,7 +47,7 @@ For more details on how we trained this model, including on data automixing and
|
|
| 51 |
- Sliding Window: 512 tokens
|
| 52 |
- Modality: text-to-text
|
| 53 |
- Context window: 131,072 tokens
|
| 54 |
-
- Reasoning support:
|
| 55 |
|
| 56 |
## Benchmark results
|
| 57 |
|
|
@@ -107,8 +103,6 @@ Launch and *Log in with Poolside* to get a free API key.
|
|
| 107 |
pool
|
| 108 |
```
|
| 109 |
|
| 110 |
-
[Placeholder for screenshot]
|
| 111 |
-
|
| 112 |
Use in any [ACP client](https://agentclientprotocol.com/get-started/clients). Configure Zed and JetBrains automatically:
|
| 113 |
|
| 114 |
```shell
|
|
@@ -187,8 +181,8 @@ response = client.chat.completions.create(
|
|
| 187 |
reasoning, content, tool_calls = "", "", []
|
| 188 |
for chunk in response:
|
| 189 |
delta = chunk.choices[0].delta
|
| 190 |
-
if hasattr(delta, "
|
| 191 |
-
reasoning += delta.
|
| 192 |
if hasattr(delta, "content") and delta.content:
|
| 193 |
content += delta.content
|
| 194 |
if hasattr(delta, "tool_calls") and delta.tool_calls:
|
|
@@ -206,7 +200,7 @@ print(f"Reasoning: {reasoning}\nContent: {content}\nTool calls: {tool_calls}\n")
|
|
| 206 |
messages.append({
|
| 207 |
"role": "assistant",
|
| 208 |
"content": content,
|
| 209 |
-
"
|
| 210 |
"tool_calls": [{"id": tc["id"], "type": "function", "function": tc["function"]} for tc in tool_calls]
|
| 211 |
})
|
| 212 |
|
|
|
|
| 1 |
---
|
| 2 |
library_name: vllm
|
| 3 |
inference: false
|
|
|
|
|
|
|
| 4 |
extra_gated_description: >-
|
| 5 |
To learn more about how we process your personal data, please read our <a
|
| 6 |
+
href="https://poolside.ai/legal/privacy">Privacy Policy</a>.
|
| 7 |
tags:
|
| 8 |
- laguna-xs.2
|
| 9 |
license: apache-2.0
|
|
|
|
| 26 |
Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token designed for agentic coding and long-horizon work on a local machine. It uses Sliding Window Attention with per-head gating in 30 out of 40 layers for fast inference and low KV cache requirements.
|
| 27 |
|
| 28 |
> [!NOTE]
|
| 29 |
+
> For more details on how we trained this model, including on data automixing and async off-policy agent RL, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive).
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## Highlights
|
| 32 |
- **Mixed SWA and global attention layout**: Laguna XS.2 uses sigmoid gating with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers in a 3:1 ratio (across 40 total layers)
|
|
|
|
| 47 |
- Sliding Window: 512 tokens
|
| 48 |
- Modality: text-to-text
|
| 49 |
- Context window: 131,072 tokens
|
| 50 |
+
- Reasoning support: interleaved thinking with preserved thinking
|
| 51 |
|
| 52 |
## Benchmark results
|
| 53 |
|
|
|
|
| 103 |
pool
|
| 104 |
```
|
| 105 |
|
|
|
|
|
|
|
| 106 |
Use in any [ACP client](https://agentclientprotocol.com/get-started/clients). Configure Zed and JetBrains automatically:
|
| 107 |
|
| 108 |
```shell
|
|
|
|
| 181 |
reasoning, content, tool_calls = "", "", []
|
| 182 |
for chunk in response:
|
| 183 |
delta = chunk.choices[0].delta
|
| 184 |
+
if hasattr(delta, "reasoning_content") and delta.reasoning_content:
|
| 185 |
+
reasoning += delta.reasoning_content
|
| 186 |
if hasattr(delta, "content") and delta.content:
|
| 187 |
content += delta.content
|
| 188 |
if hasattr(delta, "tool_calls") and delta.tool_calls:
|
|
|
|
| 200 |
messages.append({
|
| 201 |
"role": "assistant",
|
| 202 |
"content": content,
|
| 203 |
+
"reasoning_content": reasoning,
|
| 204 |
"tool_calls": [{"id": tc["id"], "type": "function", "function": tc["function"]} for tc in tool_calls]
|
| 205 |
})
|
| 206 |
|