Nekochan-Molo32-Qwen3-1.7B

A MoLo-enhanced variant of Qwen3-1.7B.

Overview

Nekochan-Molo32-Qwen3-1.7B is a custom model built on top of Qwen3-1.7B-Base, augmented with a MoLo (Mixture-of-LoRA-Experts) architecture (My idea). This model blends Qwen’s strong base capabilities with a lightweight expert-routing system, allowing it to adopt different style “modes” depending on the input.

This release focuses on:

  • Lightweight MoE-style behavior using LoRA-based experts
  • Fast inference
  • Smooth stylistic adaptation driven by a learned router
  • Small footprint while still offering multi-expert flavor

⚠️ Note:
The full architecture implementation (MoLoModel.py and MoLo layers) will be published soon (maybe) and later packaged into a dedicated library for easy import.

Model Features

  • Architecture: Qwen3-1.7B with MoLo extension
  • Experts: 32 LoRA-based experts (router-controlled)
  • Type: Causal Language Model
  • Context Length: 32,768
  • Training: Custom data with style-based expert specialization

This model is primarily designed for text generation with unique stylistic blending enabled through the MoLo expert routing system.

Quickstart

Nope, as it requires a custom model structure..

Deployment

Nekochan-Molo32-Qwen3-1.7B can be served through any framework that supports Hugging Face-style causal language models, including:

  • SGLang
  • vLLM
  • Ollama / LM Studio / llama.cpp
  • KTransformers

Once the MoLo architecture library is released, these platforms will also support full MoLo expert routing inference without additional configuration.

Best Practices

To get the best results from Nekochan-Molo32-Qwen3-1.7B:

  1. Sampling Recommendations

    • Temperature: 0.6–0.9
    • Top-p: 0.85–0.95
    • Top-k: 20–40
  2. Long-Context Usage

    • Use up to 32k tokens for extended reasoning or long-form generation.
  3. Style Control via Experts

    • Different prompts may trigger different MoLo experts, leading to varied stylistic outputs.
  4. Router Stability

    • If results seem overly uniform, reduce temperature slightly to encourage more controlled expert mixture.

Architecture Availability

The MoLo architecture source code:

  • MoLoModel.py
  • MoLoLinear
  • Router
  • Expert manager
  • Configs

will be uploaded soon and later bundled into a dedicated pip installable library for simple usage:

pip install molo-neko

Once published, users will be able to load the model like:

from molo_neko import MoLoModel
model = MoLoModel.from_pretrained("leeminwaan/Nekochan-Molo32-Qwen3-1.7B")

Stay tuned!

PS: I'll train the model first then release the library later, just to test the efficiency PS-1.1: Only the adapter will be pushed.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for leeminwaan/nekochan-molo32-qwen3-1.7B

Finetuned
(321)
this model