Model Card for shisa-ai/shisa-v2.1c-lfm2-350m

SOTA Japanese Shaberi Benchmarks @ <0.5B and <1B!

This was made just for fun on a Saturday for the Liquid AI Hackathon, but this is an early preview of our upcoming V2.1 models...

For full code and related evals see:

Presentation here:

Model Average ELYZA 100 JA-MT Rakuda Tengu
google/gemma-3-4b-it 6.44 7.34 6.78 5.68 5.97
045-llama3.2-1b-v2new-dpo405b 5.40 5.44 5.22 6.35 4.61
037-rakuten-2.0-mini-instruct-1.5b-v2new-dpo405b 5.10 5.42 4.60 5.68 4.70
augmxnt/shisa-gamma-7b-v1 4.80 5.86 4.07 4.55 4.72
shisa-ai/shisa-v2.1c-lfm2-350m 4.51 4.30 4.75 5.03 3.95
meta-llama/Llama-3.2-3B-Instruct 4.49 5.62 4.50 3.43 4.43
Qwen/Qwen3-0.6B 4.14 5.16 4.00 3.18 4.23
augmxnt/shisa-7b-v1 3.95 4.36 3.75 3.88 3.83
shisa-ai/shisa-v2.1c-lfm2-350m-sft3-tlonly 3.87 3.78 3.70 4.50 3.51
LiquidAI/LFM2-350M 3.76 3.92 4.07 3.55 3.51
meta-llama/Llama-3.2-1B-Instruct 2.97 3.82 2.82 2.45 2.79
google/gemma3-270m-it 2.53 3.42 2.33 2.10 2.28
LiquidAI/LFM2-350M-ENJP-MT 1.69 2.98 1.37 1.00 1.42
tiiuae/Falcon-H1-0.5B-Instruct 1.30 2.32 1.47 1.00 0.41

Visualize in Weights & Biases

Framework versions

  • TRL: 0.23.0
  • Transformers: 4.56.1
  • Pytorch: 2.10.0.dev20251008+cu130
  • Datasets: 4.2.0
  • Tokenizers: 0.22.1

Compute

This model was trained on an 8xMI300X node on the AMD Developer Cloud with compute generously sponsored by AMD.

Downloads last month
7
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shisa-ai/shisa-v2.1c-lfm2-350m

Finetuned
(2)
this model
Quantizations
2 models