varunrandery commited on
Commit
9f8fd38
·
verified ·
1 Parent(s): afed5b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -30,8 +30,8 @@ Laguna XS.2 is a 33B total parameter Mixture-of-Experts model with 3B activated
30
 
31
  ## Highlights
32
  - **Mixed SWA and global attention layout**: Laguna XS.2 uses sigmoid gating with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers in a 3:1 ratio (across 40 total layers)
33
- - **KV cache in FP8**: All quantization formats use a KV cache quantized to FP8, reducing memory per token
34
- - **Native reasoning support**: Interleaved thinking enabled by default
35
  - **Local-ready**: At 33B total parameters and 3B activated, Laguna XS.2 is compact enough to run on a Mac with 36 GB of RAM. [Available on Ollama](https://ollama.com/library/laguna-xs.2)
36
  - **Apache 2.0 license**: Use and modify freely for commercial and non-commercial purposes
37
 
 
30
 
31
  ## Highlights
32
  - **Mixed SWA and global attention layout**: Laguna XS.2 uses sigmoid gating with per-layer rotary scales, enabling mixed SWA (Sliding Window Attention) and global attention layers in a 3:1 ratio (across 40 total layers)
33
+ - **KV cache in FP8**: KV cache quantized to FP8, reducing memory per token
34
+ - **Native reasoning support**: Interleaved thinking between tool calls with support for enabling and disabling thinking per-request
35
  - **Local-ready**: At 33B total parameters and 3B activated, Laguna XS.2 is compact enough to run on a Mac with 36 GB of RAM. [Available on Ollama](https://ollama.com/library/laguna-xs.2)
36
  - **Apache 2.0 license**: Use and modify freely for commercial and non-commercial purposes
37