image

Iceblink

Version 3 ยท GLM-4.5 Air
โŠ
Overview

Decided to try tuning Air again after I saw Axolotl make some improvements on their training implementation and now that I know a lot more about what I'm doing. And wow. I think this came out pretty good.

This model is a creative writing and RP model. Supports reasoning and no reasoning with the usual GLM Air templates. Although reasoning off is recommended generally.

โŠ
SillyTavern Settings

Recommended Roleplay Format

Actions: In plaintext
Dialogue: "In quotes"
Thoughts: *In asterisks*

Recommended Samplers

Temp: 0.8 - 0.9
MinP: 0.05
TopP: 0.95 - 1.00

Instruct

GLM4.5 (no thinking): SillyTavern Preset

โŠ
Quantizations

GGUF

โŠ
Creation Process

Creation Process: SFT > SFT

SFT on approx 15.3 million tokens (11.7 million trainable), SFW / NSFW RP, instruct & chat data.

Then I tried out an idea I saw from ConicCat and trained the model for 8 epochs on 96 short stories (150k tokens) from light novels and human authors the internet said were good. This seems to have had a surprisingly positive effect on the prose without hurting the intelligence too much.

I went back to my usual higher LR's for this model. It turns out the GLM chat template was more cursed than I originally gave it credit for while training. It was a skill issue all along, go figure.

Axolotl Config
SFT (4ร—H200)
base_model: zai-org/GLM-4.5-Air
eot_tokens:
  - "<|user|>"
  - "<|endoftext|>"
chat_template_jinja: ./glm_air.jinja
 
plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
 
load_in_8bit: false
load_in_4bit: true
 
quantize_moe_experts: true  # important
 
datasets:
  - path: ./data/nothink_dataset.jsonl
    type: chat_template
  - path: ./data/think_dataset.jsonl
    type: chat_template
 
dataset_prepared_path: last_run_prepared
val_set_size: 0.01
output_dir: ./GLM-Air-v4-SFT-1
 
adapter: qlora
lora_model_dir:
 
sequence_len: 10756
sample_packing: true
 
lora_r: 128
lora_alpha: 16
peft_use_rslora: true
lora_dropout: 0
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
 
lora_target_parameters:
  - mlp.experts.gate_up_proj
  - mlp.experts.down_proj
 
lora_mlp_kernel: false
lora_qkv_kernel: false
lora_o_kernel: false
 
gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_torch_8bit
lr_scheduler: cosine
learning_rate: 1e-5
 
bf16: auto
tf32: false
 
resume_from_checkpoint:
logging_steps: 1
flash_attention: true
 
warmup_ratio: 0.1
evals_per_epoch: 3
saves_per_epoch: 3
 
fsdp_config:
  fsdp_version: 2
  offload_params: false
  cpu_ram_efficient_loading: false
  auto_wrap_policy: TRANSFORMER_BASED_WRAP
  transformer_layer_cls_to_wrap: Glm4MoeDecoderLayer
  state_dict_type: FULL_STATE_DICT
  sharding_strategy: FULL_SHARD
  reshard_after_forward: true
  activation_checkpointing: true
 
# save_first_step: true  # uncomment this to validate checkpoint saving works with your config

Writing SFT (2ร—H200)
base_model: ApocalypseParty/GLM-Air-v4-SFT-1-merged
eot_tokens:
  - "<|user|>"
  - "<|endoftext|>"
chat_template_jinja: ./glm_air.jinja
 
plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
 
load_in_8bit: false
load_in_4bit: true
 
quantize_moe_experts: true  # important
 
datasets:
  - path: ./data/dataset_writing.jsonl
    type: chat_template
 
dataset_prepared_path: last_run_prepared
output_dir: ./GLM-Air-v4-SFT-1-writing
 
wandb_project: GLM-Air-v4-SFT
wandb_name: GLM-Air-v4-SFT-1-writing
 
adapter: qlora
lora_model_dir:
 
sequence_len: 4096
sample_packing: true
 
lora_r: 16
lora_alpha: 32
lora_dropout: 0
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
 
lora_target_parameters:
  - mlp.experts.gate_up_proj
  - mlp.experts.down_proj
 
lora_mlp_kernel: false
lora_qkv_kernel: false
lora_o_kernel: false
 
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 8
optimizer: adamw_torch_8bit
lr_scheduler: cosine
learning_rate: 9e-6
 
bf16: auto
tf32: false
 
resume_from_checkpoint:
logging_steps: 1
flash_attention: true
 
warmup_ratio: 0.1
saves_per_epoch: 1
 
fsdp_config:
  fsdp_version: 2
  offload_params: false
  cpu_ram_efficient_loading: false
  auto_wrap_policy: TRANSFORMER_BASED_WRAP
  transformer_layer_cls_to_wrap: Glm4MoeDecoderLayer
  state_dict_type: FULL_STATE_DICT
  sharding_strategy: FULL_SHARD
  reshard_after_forward: true
  activation_checkpointing: true
 
# save_first_step: true  # uncomment this to validate checkpoint saving works with your config
Downloads last month
303
GGUF
Model size
107B params
Architecture
glm4moe
Hardware compatibility
Log In to add your hardware

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for zerofata/GLM-4.5-Iceblink-v3-106B-A12B-GGUF

Quantized
(8)
this model