Nekochan-Molo32-Qwen3-1.7B
A MoLo-enhanced variant of Qwen3-1.7B.
Overview
Nekochan-Molo32-Qwen3-1.7B is a custom model built on top of Qwen3-1.7B-Base, augmented with a MoLo (Mixture-of-LoRA-Experts) architecture (My idea). This model blends Qwen’s strong base capabilities with a lightweight expert-routing system, allowing it to adopt different style “modes” depending on the input.
This release focuses on:
- Lightweight MoE-style behavior using LoRA-based experts
- Fast inference
- Smooth stylistic adaptation driven by a learned router
- Small footprint while still offering multi-expert flavor
⚠️ Note:
The full architecture implementation (MoLoModel.py and MoLo layers) will be published soon (maybe) and later packaged into a dedicated library for easy import.
Model Features
- Architecture: Qwen3-1.7B with MoLo extension
- Experts: 32 LoRA-based experts (router-controlled)
- Type: Causal Language Model
- Context Length: 32,768
- Training: Custom data with style-based expert specialization
This model is primarily designed for text generation with unique stylistic blending enabled through the MoLo expert routing system.
Quickstart
Nope, as it requires a custom model structure..
Deployment
Nekochan-Molo32-Qwen3-1.7B can be served through any framework that supports Hugging Face-style causal language models, including:
- SGLang
- vLLM
- Ollama / LM Studio / llama.cpp
- KTransformers
Once the MoLo architecture library is released, these platforms will also support full MoLo expert routing inference without additional configuration.
Best Practices
To get the best results from Nekochan-Molo32-Qwen3-1.7B:
Sampling Recommendations
- Temperature: 0.6–0.9
- Top-p: 0.85–0.95
- Top-k: 20–40
Long-Context Usage
- Use up to 32k tokens for extended reasoning or long-form generation.
Style Control via Experts
- Different prompts may trigger different MoLo experts, leading to varied stylistic outputs.
Router Stability
- If results seem overly uniform, reduce temperature slightly to encourage more controlled expert mixture.
Architecture Availability
The MoLo architecture source code:
MoLoModel.pyMoLoLinear- Router
- Expert manager
- Configs
will be uploaded soon and later bundled into a dedicated pip installable library for simple usage:
pip install molo-neko
Once published, users will be able to load the model like:
from molo_neko import MoLoModel
model = MoLoModel.from_pretrained("leeminwaan/Nekochan-Molo32-Qwen3-1.7B")
Stay tuned!
PS: I'll train the model first then release the library later, just to test the efficiency PS-1.1: Only the adapter will be pushed.
Model tree for leeminwaan/nekochan-molo32-qwen3-1.7B
Base model
Qwen/Qwen3-1.7B-Base