varunrandery commited on
Commit
cf73f63
·
verified ·
1 Parent(s): b056a21

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -28,7 +28,7 @@ pipeline_tag: text-generation
28
  Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token designed for agentic coding and long-horizon work on a local machine. It uses Sliding Window Attention with per-head gating in 30 out of 40 layers for fast inference and low KV cache requirements.
29
 
30
  > [!NOTE]
31
- > This is the instruct model with native reasoning support and interleaved thinking. For the base model, see [Laguna XS.2-base](https://huggingface.co/poolside/Laguna-XS.2-base).
32
 
33
  For more details on how we trained this model, including on data automixing and async off-policy agent RL, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive).
34
 
@@ -43,7 +43,7 @@ For more details on how we trained this model, including on data automixing and
43
 
44
  ## Model overview
45
 
46
- - Training: pre-training, post-training and reinforcement learning stages (instruct)
47
  - Number of parameters: 33B total with 3B activated per token
48
  - Optimizer: Muon
49
  - Layers: 40 layers (10 layers with global attention, 30 layers with sliding window attention)
 
28
  Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated parameters per token designed for agentic coding and long-horizon work on a local machine. It uses Sliding Window Attention with per-head gating in 30 out of 40 layers for fast inference and low KV cache requirements.
29
 
30
  > [!NOTE]
31
+ > This is the final model with native reasoning support and interleaved thinking. For the base model, see [Laguna XS.2-base](https://huggingface.co/poolside/Laguna-XS.2-base).
32
 
33
  For more details on how we trained this model, including on data automixing and async off-policy agent RL, check out our [release blog post](https://poolside.ai/blog/laguna-a-deeper-dive).
34
 
 
43
 
44
  ## Model overview
45
 
46
+ - Training: pre-training, post-training and reinforcement learning stages
47
  - Number of parameters: 33B total with 3B activated per token
48
  - Optimizer: Muon
49
  - Layers: 40 layers (10 layers with global attention, 30 layers with sliding window attention)