Update README.md
Browse files
README.md
CHANGED
|
@@ -3,7 +3,7 @@ license: apache-2.0
|
|
| 3 |
base_model: poolside/Laguna-XS.2
|
| 4 |
tags:
|
| 5 |
- gguf
|
| 6 |
-
-
|
| 7 |
- moe
|
| 8 |
- code
|
| 9 |
- quantized
|
|
@@ -11,7 +11,7 @@ tags:
|
|
| 11 |
|
| 12 |
# Laguna-XS.2 GGUF (BF16 + Q4_K_M)
|
| 13 |
|
| 14 |
-
GGUF conversions of [poolside/Laguna-XS.2](https://huggingface.co/poolside/Laguna-XS.2) for use with [
|
| 15 |
|
| 16 |
## Files
|
| 17 |
|
|
@@ -29,7 +29,7 @@ GGUF conversions of [poolside/Laguna-XS.2](https://huggingface.co/poolside/Lagun
|
|
| 29 |
|
| 30 |
Imatrix calibration uses Bartowski `calibration_datav3.txt` (multilingual + code mix), the same corpus Unsloth-distributed quants use.
|
| 31 |
|
| 32 |
-
## Required llama.cpp patch
|
| 33 |
|
| 34 |
Laguna-XS.2 is a NEW architecture (`LLM_ARCH_LAGUNA`) not present in upstream llama.cpp. Loading these GGUFs requires a llama.cpp build with the LAGUNA arch added. Patches available at: https://github.com/your-org/lucebox-hub (see `dflash/deps/llama.cpp/src/models/laguna.cpp` and related changes).
|
| 35 |
|
|
@@ -43,10 +43,5 @@ Architecture summary:
|
|
| 43 |
- Always-on shared expert (intermediate=512)
|
| 44 |
- Dense layer 0, sparse MoE layers 1–39
|
| 45 |
|
| 46 |
-
## Quick test
|
| 47 |
-
|
| 48 |
-
```bash
|
| 49 |
-
./llama-simple -m laguna-xs2-Q4_K_M.gguf -ngl 99 -n 128 "def fibonacci(n):"
|
| 50 |
-
```
|
| 51 |
|
| 52 |
Tested on RTX 3090 24GB and A100 80GB. Inference @ 154 tok/s on A100 SXM Q4_K_M.
|
|
|
|
| 3 |
base_model: poolside/Laguna-XS.2
|
| 4 |
tags:
|
| 5 |
- gguf
|
| 6 |
+
- lucebox
|
| 7 |
- moe
|
| 8 |
- code
|
| 9 |
- quantized
|
|
|
|
| 11 |
|
| 12 |
# Laguna-XS.2 GGUF (BF16 + Q4_K_M)
|
| 13 |
|
| 14 |
+
GGUF conversions of [poolside/Laguna-XS.2](https://huggingface.co/poolside/Laguna-XS.2) for use with [lucebox-hub](https://github.com/Luce-Org/lucebox-hub).
|
| 15 |
|
| 16 |
## Files
|
| 17 |
|
|
|
|
| 29 |
|
| 30 |
Imatrix calibration uses Bartowski `calibration_datav3.txt` (multilingual + code mix), the same corpus Unsloth-distributed quants use.
|
| 31 |
|
| 32 |
+
## Required llama.cpp/ggml patch inside lucebox-hub
|
| 33 |
|
| 34 |
Laguna-XS.2 is a NEW architecture (`LLM_ARCH_LAGUNA`) not present in upstream llama.cpp. Loading these GGUFs requires a llama.cpp build with the LAGUNA arch added. Patches available at: https://github.com/your-org/lucebox-hub (see `dflash/deps/llama.cpp/src/models/laguna.cpp` and related changes).
|
| 35 |
|
|
|
|
| 43 |
- Always-on shared expert (intermediate=512)
|
| 44 |
- Dense layer 0, sparse MoE layers 1–39
|
| 45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
Tested on RTX 3090 24GB and A100 80GB. Inference @ 154 tok/s on A100 SXM Q4_K_M.
|