Iceblink
Version 3 ยท GLM-4.5 AirDecided to try tuning Air again after I saw Axolotl make some improvements on their training implementation and now that I know a lot more about what I'm doing. And wow. I think this came out pretty good.
This model is a creative writing and RP model. Supports reasoning and no reasoning with the usual GLM Air templates. Although reasoning off is recommended generally.
Recommended Roleplay Format
Recommended Samplers
Instruct
GLM4.5 (no thinking): SillyTavern Preset
GGUF
Creation Process: SFT > SFT
SFT on approx 15.3 million tokens (11.7 million trainable), SFW / NSFW RP, instruct & chat data.
Then I tried out an idea I saw from ConicCat and trained the model for 8 epochs on 96 short stories (150k tokens) from light novels and human authors the internet said were good. This seems to have had a surprisingly positive effect on the prose without hurting the intelligence too much.
I went back to my usual higher LR's for this model. It turns out the GLM chat template was more cursed than I originally gave it credit for while training. It was a skill issue all along, go figure.
Axolotl Config
base_model: zai-org/GLM-4.5-Air
eot_tokens:
- "<|user|>"
- "<|endoftext|>"
chat_template_jinja: ./glm_air.jinja
plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
load_in_8bit: false
load_in_4bit: true
quantize_moe_experts: true # important
datasets:
- path: ./data/nothink_dataset.jsonl
type: chat_template
- path: ./data/think_dataset.jsonl
type: chat_template
dataset_prepared_path: last_run_prepared
val_set_size: 0.01
output_dir: ./GLM-Air-v4-SFT-1
adapter: qlora
lora_model_dir:
sequence_len: 10756
sample_packing: true
lora_r: 128
lora_alpha: 16
peft_use_rslora: true
lora_dropout: 0
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
lora_target_parameters:
- mlp.experts.gate_up_proj
- mlp.experts.down_proj
lora_mlp_kernel: false
lora_qkv_kernel: false
lora_o_kernel: false
gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 2
optimizer: adamw_torch_8bit
lr_scheduler: cosine
learning_rate: 1e-5
bf16: auto
tf32: false
resume_from_checkpoint:
logging_steps: 1
flash_attention: true
warmup_ratio: 0.1
evals_per_epoch: 3
saves_per_epoch: 3
fsdp_config:
fsdp_version: 2
offload_params: false
cpu_ram_efficient_loading: false
auto_wrap_policy: TRANSFORMER_BASED_WRAP
transformer_layer_cls_to_wrap: Glm4MoeDecoderLayer
state_dict_type: FULL_STATE_DICT
sharding_strategy: FULL_SHARD
reshard_after_forward: true
activation_checkpointing: true
# save_first_step: true # uncomment this to validate checkpoint saving works with your config
base_model: ApocalypseParty/GLM-Air-v4-SFT-1-merged
eot_tokens:
- "<|user|>"
- "<|endoftext|>"
chat_template_jinja: ./glm_air.jinja
plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
load_in_8bit: false
load_in_4bit: true
quantize_moe_experts: true # important
datasets:
- path: ./data/dataset_writing.jsonl
type: chat_template
dataset_prepared_path: last_run_prepared
output_dir: ./GLM-Air-v4-SFT-1-writing
wandb_project: GLM-Air-v4-SFT
wandb_name: GLM-Air-v4-SFT-1-writing
adapter: qlora
lora_model_dir:
sequence_len: 4096
sample_packing: true
lora_r: 16
lora_alpha: 32
lora_dropout: 0
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
lora_target_parameters:
- mlp.experts.gate_up_proj
- mlp.experts.down_proj
lora_mlp_kernel: false
lora_qkv_kernel: false
lora_o_kernel: false
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 8
optimizer: adamw_torch_8bit
lr_scheduler: cosine
learning_rate: 9e-6
bf16: auto
tf32: false
resume_from_checkpoint:
logging_steps: 1
flash_attention: true
warmup_ratio: 0.1
saves_per_epoch: 1
fsdp_config:
fsdp_version: 2
offload_params: false
cpu_ram_efficient_loading: false
auto_wrap_policy: TRANSFORMER_BASED_WRAP
transformer_layer_cls_to_wrap: Glm4MoeDecoderLayer
state_dict_type: FULL_STATE_DICT
sharding_strategy: FULL_SHARD
reshard_after_forward: true
activation_checkpointing: true
# save_first_step: true # uncomment this to validate checkpoint saving works with your config
- Downloads last month
- 303
5-bit
8-bit
Model tree for zerofata/GLM-4.5-Iceblink-v3-106B-A12B-GGUF
Base model
zai-org/GLM-4.5-Air