docs: contact = eric@osaurus.ai for OsaurusAI bundles

923c851 verified 7 days ago

2.41 kB

	---
	license: apache-2.0
	library_name: mlx
	base_model: poolside/Laguna-XS.2
	pipeline_tag: text-generation
	tags:
	- mxfp4
	- mlx
	- apple-silicon
	- laguna
	- moe
	- agentic-coding
	- quantized
	---

	![Osaurus](osaurus-x-banner.png)

	# OsaurusAI/Laguna-XS.2-mxfp4

	Quantized poolside Laguna-XS.2 for Apple Silicon (MLX) — agentic-coding 33B-active-3B Mixture-of-Experts.

	\| \| \|
	\|--\|--\|
	\| Source \| [poolside/Laguna-XS.2](https://huggingface.co/poolside/Laguna-XS.2) \|
	\| Architecture \| `laguna` (40 layers, 256 routed experts top-8 + 1 shared, hybrid SWA+full attention) \|
	\| Quant format \| MXFP4 (mlx 4-bit affine, group_size=32) \|
	\| Bundle size on disk \| 20.93 GB (21 safetensors shards) \|
	\| License \| Apache-2.0 (inherits from upstream) \|
	\| Modalities \| Text in / text out (no vision, no audio, no video) \|

	## What's quantized

	- All routed-expert linears (3D stacked), attention, dense layer-0 MLP, shared-expert, embed_tokens → mlx 4-bit affine (`bits=4 group_size=32`)
	- lm_head + all RMSNorms + router gate + `e_score_correction_bias` → fp16 passthrough


	## Architecture notes (preserved verbatim from upstream)

	- 40 layers; per-layer attention head count alternates 48 (full-attn) / 64 (SWA) with shared 8 KV heads (GQA)
	- 1:3 ratio of full-attn ↔ sliding-window-attention (window = 512), explicit `layer_types` list
	- Dual RoPE: full-attn = YaRN (base 500K, factor 32, original 4096, β_fast 64, β_slow 1, partial_rotary 0.5); SWA = default (base 10K, full rotary)
	- 256 routed experts (top-8) + 1 shared expert; sigmoid + per-head gating (`g_proj`); `q_norm`/`k_norm` in attention
	- 131k context window
	- Layer 0 dense MLP; layers 1-39 sparse MoE

	## Run on Apple Silicon

	```bash
	pip install mlx safetensors transformers
	python -m jang_tools.laguna.runtime \
	--src ~/.mlxstudio/models/OsaurusAI/Laguna-XS.2-mxfp4 \
	--prompt "def fibonacci(n):" --max-new 64
	```

	The runtime auto-detects `weight_format` (mxtq / mxfp4 / bf16) and loads the matching path (`jang_tools/laguna/weight_loader_bf16.py`).

	## Build

	Reproduce locally from the bf16 source:

	```bash
	python -m jang_tools.convert_laguna_mxfp4 \
	~/.mlxstudio/models/_sources/Laguna-XS.2 \
	~/.mlxstudio/models/OsaurusAI/Laguna-XS.2-mxfp4
	```

	## Credits

	Quantized by Jinho Jang ([eric@osaurus.ai](mailto:eric@osaurus.ai)). MLX-native pipeline, runs on M-series Macs.