Merged Model: toksuite-qwen-onto-llama

This model is a result of parameter averaging (Model Soup) across 2 models.

Merged Models

The following models were included in the merge:

toksuite/meta-llama-Llama-3.2-1B
toksuite/Qwen-Qwen3-8B

Merging Configuration

Method: Weighted Parameter Averaging
Weights: Simple average with merging lambda = 0.5.
Excluded Layers: Embeddings and LM Head were kept from the base model (toksuite/meta-llama-Llama-3.2-1B).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("moe-dtoks/toksuite-qwen-onto-llama")
tokenizer = AutoTokenizer.from_pretrained("moe-dtoks/toksuite-qwen-onto-llama")

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for moe-dtoks/toksuite-qwen-onto-llama

toksuite/Qwen-Qwen3-8B

toksuite/meta-llama-Llama-3.2-1B

Merge model

this model