Loggenix MoE 0.4B — Cactus Format

Pre-converted weights for Loggenix/loggenix-moe-0.4b-0.2a-sft-s3.1 in Cactus format for on-device inference.

Model Details

Property	Value
Architecture	Qwen3MoeForCausalLM
Parameters	~0.4B (active ~0.2B per token)
Experts	16 experts, top-2 routing
Hidden Dim	512
Layers	12
Attention Heads	8 (2 KV heads)
Expert FFN Dim	768
Context Length	262,144 tokens
Precision	FP16
Vocab Size	151,936

Format

These weights are in Cactus binary format (.weights files + config.txt), converted from HuggingFace safetensors using the Cactus Python conversion pipeline.

This is not a standard HuggingFace Transformers model. It is designed to be loaded directly by the Cactus C++ inference engine for mobile and edge deployment (iOS, Android, macOS, Linux ARM64).

Usage

Download the entire repo and point Cactus to the directory:

# Clone weights
git clone https://huggingface.co/kshitijthakkar/loggenix-moe-0.4b-cactus

# Use with Cactus engine
cactus run ./loggenix-moe-0.4b-cactus

Conversion

Converted using the Cactus Python converter from the original safetensors:

cactus convert Loggenix/loggenix-moe-0.4b-0.2a-sft-s3.1 ./output --precision FP16

Performance (Pixel 7a, Tensor G2)

Decode: 36-42 tokens/sec
TTFT: 77-248ms
Model init: ~352ms

License

See the original model card for license details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support