Merged Model: toksuite-all-merged

This model is a result of parameter averaging (Model Soup) across 14 models.

Merged Models

The following models were included in the merge:

  • toksuite/supertoken_models-llama_google-byt5-small
  • toksuite/supertoken_models-llama_facebook-xglm-564M
  • toksuite/supertoken_models-llama_bigscience-bloom
  • toksuite/supertoken_models-llama_gpt2
  • toksuite/supertoken_models-llama_common-pile-comma-v0.1
  • toksuite/supertoken_models-llama_google-gemma-2-2b
  • toksuite/supertoken_models-llama_microsoft-Phi-3-mini-4k-instruct
  • toksuite/supertoken_models-llama_meta-llama-Llama-3.2-1B
  • toksuite/supertoken_models-llama_CohereLabs-aya-expanse-8b
  • toksuite/supertoken_models-llama_tiktoken-gpt-4o
  • toksuite/supertoken_models-llama_google-bert-bert-base-multilingual-cased
  • toksuite/supertoken_models-llama_Qwen-Qwen3-8B
  • toksuite/supertoken_models-llama_tokenmonster-englishcode-32000-consistent-v1
  • toksuite/supertoken_models-llama_mistralai-tekken

Merging Configuration

  • Method: Weighted Parameter Averaging
  • Weights: Simple average with merging lambda = 1.0.
  • Excluded Layers: Embeddings and LM Head were kept from the base model (toksuite/supertoken_models-llama_meta-llama-Llama-3.2-1B).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("moe-dtoks/toksuite-all-merged")
tokenizer = AutoTokenizer.from_pretrained("moe-dtoks/toksuite-all-merged")
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for moe-dtoks/toksuite-all-merged