Update README.md
Browse files
README.md
CHANGED
|
@@ -28,7 +28,7 @@ pipeline_tag: text-generation
|
|
| 28 |
Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token designed for agentic coding and long-horizon work on a local machine. It uses Sliding Window Attention with per-head gating in 30 out of 40 layers for fast inference and low KV cache requirements.
|
| 29 |
|
| 30 |
> [!NOTE]
|
| 31 |
-
> This is the
|
| 32 |
|
| 33 |
For more details on how we trained this model, including on data automixing and async off-policy agent RL, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive).
|
| 34 |
|
|
@@ -43,7 +43,7 @@ For more details on how we trained this model, including on data automixing and
|
|
| 43 |
|
| 44 |
## Model overview
|
| 45 |
|
| 46 |
-
- Training: pre-training, post-training and reinforcement learning stages
|
| 47 |
- Number of parameters: 33B total with 3B activated per token
|
| 48 |
- Optimizer: Muon
|
| 49 |
- Layers: 40 layers (10 layers with global attention, 30 layers with sliding window attention)
|
|
|
|
| 28 |
Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token designed for agentic coding and long-horizon work on a local machine. It uses Sliding Window Attention with per-head gating in 30 out of 40 layers for fast inference and low KV cache requirements.
|
| 29 |
|
| 30 |
> [!NOTE]
|
| 31 |
+
> This is the final model with native reasoning support and interleaved thinking. For the base model, see [Laguna XS.2-base](https://huggingface.co/poolside/Laguna-XS.2-base).
|
| 32 |
|
| 33 |
For more details on how we trained this model, including on data automixing and async off-policy agent RL, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive).
|
| 34 |
|
|
|
|
| 43 |
|
| 44 |
## Model overview
|
| 45 |
|
| 46 |
+
- Training: pre-training, post-training and reinforcement learning stages
|
| 47 |
- Number of parameters: 33B total with 3B activated per token
|
| 48 |
- Optimizer: Muon
|
| 49 |
- Layers: 40 layers (10 layers with global attention, 30 layers with sliding window attention)
|