Loggenix MoE 0.4B โ€” Cactus Format

Pre-converted weights for Loggenix/loggenix-moe-0.4b-0.2a-sft-s3.1 in Cactus format for on-device inference.

Model Details

Property Value
Architecture Qwen3MoeForCausalLM
Parameters ~0.4B (active ~0.2B per token)
Experts 16 experts, top-2 routing
Hidden Dim 512
Layers 12
Attention Heads 8 (2 KV heads)
Expert FFN Dim 768
Context Length 262,144 tokens
Precision FP16
Vocab Size 151,936

Format

These weights are in Cactus binary format (.weights files + config.txt), converted from HuggingFace safetensors using the Cactus Python conversion pipeline.

This is not a standard HuggingFace Transformers model. It is designed to be loaded directly by the Cactus C++ inference engine for mobile and edge deployment (iOS, Android, macOS, Linux ARM64).

Usage

Download the entire repo and point Cactus to the directory:

# Clone weights
git clone https://huggingface.co/kshitijthakkar/loggenix-moe-0.4b-cactus

# Use with Cactus engine
cactus run ./loggenix-moe-0.4b-cactus

Conversion

Converted using the Cactus Python converter from the original safetensors:

cactus convert Loggenix/loggenix-moe-0.4b-0.2a-sft-s3.1 ./output --precision FP16

Performance (Pixel 7a, Tensor G2)

  • Decode: 36-42 tokens/sec
  • TTFT: 77-248ms
  • Model init: ~352ms

License

See the original model card for license details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support