loggenix-moe-0.4B-0.2A-sft-s3.1

Model Description

This is a Mixture of Experts (MoE) language model fine-tuned for various tasks including:

  • Tool/Function Calling
  • Code Generation
  • Reasoning & Math
  • Safety & Content Moderation

Evaluation Results

Evaluation Date: 20260128_105522

Standard Benchmarks (lm-evaluation-harness)

Benchmark Score
arc_challenge 10.00%
arc_easy 10.00%
boolq 40.00%
gsm8k 0.00%
hellaswag 40.00%
mmlu 25.26%
mmlu_humanities 26.15%
mmlu_formal_logic 20.00%
mmlu_high_school_european_history 20.00%
mmlu_high_school_us_history 30.00%
mmlu_high_school_world_history 30.00%
mmlu_international_law 30.00%
mmlu_jurisprudence 40.00%
mmlu_logical_fallacies 40.00%
mmlu_moral_disputes 10.00%
mmlu_moral_scenarios 20.00%
mmlu_philosophy 10.00%
mmlu_prehistory 30.00%
mmlu_professional_law 20.00%
mmlu_world_religions 40.00%
mmlu_other 26.15%
mmlu_business_ethics 10.00%
mmlu_clinical_knowledge 40.00%
mmlu_college_medicine 40.00%
mmlu_global_facts 20.00%
mmlu_human_aging 40.00%
mmlu_management 0.00%
mmlu_marketing 50.00%
mmlu_medical_genetics 30.00%
mmlu_miscellaneous 20.00%
mmlu_nutrition 20.00%
mmlu_professional_accounting 30.00%
mmlu_professional_medicine 20.00%
mmlu_virology 20.00%
mmlu_social_sciences 27.50%
mmlu_econometrics 40.00%
mmlu_high_school_geography 20.00%
mmlu_high_school_government_and_politics 30.00%
mmlu_high_school_macroeconomics 0.00%
mmlu_high_school_microeconomics 10.00%
mmlu_high_school_psychology 50.00%
mmlu_human_sexuality 20.00%
mmlu_professional_psychology 40.00%
mmlu_public_relations 20.00%
mmlu_security_studies 30.00%
mmlu_sociology 20.00%
mmlu_us_foreign_policy 50.00%
mmlu_stem 22.63%
mmlu_abstract_algebra 20.00%
mmlu_anatomy 20.00%
mmlu_astronomy 20.00%
mmlu_college_biology 30.00%
mmlu_college_chemistry 10.00%
mmlu_college_computer_science 20.00%
mmlu_college_mathematics 20.00%
mmlu_college_physics 30.00%
mmlu_computer_security 50.00%
mmlu_conceptual_physics 30.00%
mmlu_electrical_engineering 40.00%
mmlu_elementary_mathematics 0.00%
mmlu_high_school_biology 30.00%
mmlu_high_school_chemistry 30.00%
mmlu_high_school_computer_science 20.00%
mmlu_high_school_mathematics 20.00%
mmlu_high_school_physics 10.00%
mmlu_high_school_statistics 0.00%
mmlu_machine_learning 30.00%
openbookqa 40.00%
piqa 70.00%
winogrande 60.00%

Synthetic Task Categories

Category Score Tasks Evaluated
Tool Calling 50.00% 1
SRE DevOps 16.67% 12
Programming 25.88% 17
Reasoning 11.25% 4
LLM Evaluation 22.00% 5
Safety Ethics 20.00% 8
Financial 23.75% 8
Customer Support 16.88% 8
Observability 26.25% 8
Content Generation 13.33% 3
Core AI 35.00% 3

Tool-Calling Performance

Metric Score
Format Accuracy 20.00%
Function Name Accuracy 0.00%
Parameter Accuracy 0.00%
Overall Accuracy 6.67%

Code Generation Performance

Metric Score
Syntax Accuracy 8.33%
Keyword Coverage 29.38%
Completion Rate 75.00%

Overall Summary

  • Synthetic Tasks Mean Score: 21.68%
  • Total Tasks Evaluated: 170
  • Task Coverage: 180.9%

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("kshitijthakkar/loggenix-moe-0.4B-0.2A-sft-s3.1")
model = AutoModelForCausalLM.from_pretrained("kshitijthakkar/loggenix-moe-0.4B-0.2A-sft-s3.1", trust_remote_code=True)

# For tool calling
messages = [
    {"role": "system", "content": "You are a helpful assistant with access to tools."},
    {"role": "user", "content": "What's the weather in San Francisco?"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

Training Data

The model was fine-tuned on a diverse dataset including:

  • Tool-calling datasets (Toucan, ToolACE, SmoLAgents)
  • Safety datasets (HelpSteer3, Safety-Guard, Content-Safety-Reasoning)
  • Math datasets (GSM8K, MetaMath, Big-Math-RL)
  • Code datasets (Magicoder)
  • Reasoning datasets (Reasoning-Gemini, Textbook-Reasoning)

License

Apache 2.0

Downloads last month
165
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using kshitijthakkar/loggenix-moe-0.4B-0.2A-sft-s3.1 1