Update README.md
Browse files
README.md
CHANGED
|
@@ -30,8 +30,8 @@ Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated
|
|
| 30 |
|
| 31 |
## Highlights
|
| 32 |
- **Mixed SWA and global attention layout**: Laguna XS.2 uses sigmoid gating with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers in a 3:1 ratio (across 40 total layers)
|
| 33 |
-
- **KV cache in FP8**:
|
| 34 |
-
- **Native reasoning support**: Interleaved thinking
|
| 35 |
- **Local-ready**: At 33B total parameters and 3B activated, Laguna XS.2 is compact enough to run on a Mac with 36 GB of RAM. [Available on Ollama](https://ollama.com/library/laguna-xs.2)
|
| 36 |
- **Apache 2.0 license**: Use and modify freely for commercial and non-commercial purposes
|
| 37 |
|
|
|
|
| 30 |
|
| 31 |
## Highlights
|
| 32 |
- **Mixed SWA and global attention layout**: Laguna XS.2 uses sigmoid gating with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers in a 3:1 ratio (across 40 total layers)
|
| 33 |
+
- **KV cache in FP8**: KV cache quantized to FP8, reducing memory per token
|
| 34 |
+
- **Native reasoning support**: Interleaved thinking between tool calls with support for enabling and disabling thinking per-request
|
| 35 |
- **Local-ready**: At 33B total parameters and 3B activated, Laguna XS.2 is compact enough to run on a Mac with 36 GB of RAM. [Available on Ollama](https://ollama.com/library/laguna-xs.2)
|
| 36 |
- **Apache 2.0 license**: Use and modify freely for commercial and non-commercial purposes
|
| 37 |
|