Loggenix MoE 0.4B โ Cactus Format
Pre-converted weights for Loggenix/loggenix-moe-0.4b-0.2a-sft-s3.1 in Cactus format for on-device inference.
Model Details
| Property | Value |
|---|---|
| Architecture | Qwen3MoeForCausalLM |
| Parameters | ~0.4B (active ~0.2B per token) |
| Experts | 16 experts, top-2 routing |
| Hidden Dim | 512 |
| Layers | 12 |
| Attention Heads | 8 (2 KV heads) |
| Expert FFN Dim | 768 |
| Context Length | 262,144 tokens |
| Precision | FP16 |
| Vocab Size | 151,936 |
Format
These weights are in Cactus binary format (.weights files + config.txt), converted from HuggingFace safetensors using the Cactus Python conversion pipeline.
This is not a standard HuggingFace Transformers model. It is designed to be loaded directly by the Cactus C++ inference engine for mobile and edge deployment (iOS, Android, macOS, Linux ARM64).
Usage
Download the entire repo and point Cactus to the directory:
# Clone weights
git clone https://huggingface.co/kshitijthakkar/loggenix-moe-0.4b-cactus
# Use with Cactus engine
cactus run ./loggenix-moe-0.4b-cactus
Conversion
Converted using the Cactus Python converter from the original safetensors:
cactus convert Loggenix/loggenix-moe-0.4b-0.2a-sft-s3.1 ./output --precision FP16
Performance (Pixel 7a, Tensor G2)
- Decode: 36-42 tokens/sec
- TTFT: 77-248ms
- Model init: ~352ms
License
See the original model card for license details.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support