Built with Axolotl

See axolotl config

axolotl version: 0.9.2

base_model: mlabonne/gemma-3-27b-it-abliterated-v2
base_model_config: mlabonne/gemma-3-27b-it-abliterated-v2
model_type: Gemma3ForCausalLM
tokenizer_type: AutoTokenizer

load_in_4bit: true
bnb_4bit_compute_dtype: bfloat16
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: true
strict: false

datasets:
#  - path: data/rp_below_50_train_a_1.parquet
#    type: chat_template
#    field_messages: messages
#    split: train
#  - path: data/rp_below_50_train_a_2.parquet
#    type: chat_template
#    field_messages: messages
#    split: train
#  - path: data/rp_below_50_train_a_3.parquet
#    type: chat_template
#    field_messages: messages
#    split: train
#  - path: data/rp_below_50_train_a_4.parquet
#    type: chat_template
#    field_messages: messages
#    split: train
  - path: data/train_chats.parquet
    type: chat_template
    field_messages: messages
    split: train
#  - path: data/bdsm_train.parquet
#    type: chat_template
#    field_messages: messages
#    split: train
val_set_size: 0.0  # or use one of your test datasets with 'validation' split
test_datasets:
  - path: data/val_chats.parquet
    type: chat_template
    field_messages: messages
    split: train
#  - path: data/bdsm_test.parquet
#    type: chat_template
#    field_messages: messages
#    split: train
#  - path: data/rp_below_50_test.parquet
#    type: chat_template
#    field_messages: messages
#    split: train

chat_template: chatml

#val_set_size: 0.05 # This will now be ignored if you have a separate 'validation' split defined above
save_safetensors: true
output_dir: ./outputs/libry_j_ai # <-- OPTIONAL: Consider renaming for clarity with the new model

sequence_len: 4096 # Keep this for now, but be mindful of memory for 27B without Flash Attention

wandb_project: sdg-finetune
wandb_watch: all
wandb_name: Gemma3ForCausalLM

# Batch & Training Config
micro_batch_size: 2
eval_batch_size: 2 # Consistent with micro_batch_size
gradient_accumulation_steps: 4
num_epochs: 2

evals_per_epoch: 5
eval_sample_packing: true # <-- OPTIONAL: Changed back to true for efficiency and consistency
eval_table_size: 0

# Optimizer
optimizer: paged_adamw_32bit # DeepSpeed will manage the optimizer with its settings
adam_beta2: 0.95
adam_epsilon: 0.00001
lr_scheduler: cosine
learning_rate: 1e-4
weight_decay: 0.1
warmup_ratio: 0.03
gradient_clipping: 1.0
group_by_length: true

# Memory & Precision
train_on_inputs: true
train_on_eos: turn
bf16: true
fp16: false
tf32: false
sample_packing: false
pad_to_sequence_len: true # Recommended with sample_packing

roles_to_train: ["assistant", "model", "user", "system"]

gradient_checkpointing: true
save_steps: 500
#gradient_checkpointing_kwargs:
#  use_reentrant: false
#resume_from_checkpoint: ./outputs/libry_run2/checkpoint-379 # <-- CHANGE THIS: Set to 'true' to continue from the last checkpoint
#auto_resume_from_checkpoint: false

logging_steps: 1
flash_attention: false        # DISABLED Flash Attention
flash_attention_v2: false     # DISABLED Flash Attention V2
attn_implementation: eager    # Set attention implementation back to eager

# LoRA Settings
adapter: lora
lora_r: 32
lora_alpha: 64
lora_dropout: 0.05
lora_target_modules: ["q_proj", "v_proj", "o_proj"]
lora_bias: none
lora_task_type: CAUSAL_LM

# DeepSpeed Configuration (Optimized)
deepspeed: deepspeed_configs/zero2_cpu_offload.json # Point to your DeepSpeed config file

# Hugging Face Upload (Disabled)
push_to_hub: false

special_tokens:
  eos_token: "<eos>"
  bos_token: "<bos>"
  unk_token: "<unk>"
  pad_token: "<pad>"

outputs/libry_j_ai

This model is a fine-tuned version of mlabonne/gemma-3-27b-it-abliterated-v2 on the data/train_chats.parquet dataset. It achieves the following results on the evaluation set:

  • Loss: nan

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use paged_adamw_32bit with betas=(0.9,0.95) and epsilon=1e-05 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 224
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss
2.8256 0.0003 1 nan
1.9027 0.2002 750 nan
2.0386 0.4004 1500 nan
1.5538 0.6006 2250 nan
2.107 0.8008 3000 nan
0.8711 1.0008 3750 nan
0.7524 1.2010 4500 nan
0.8172 1.4012 5250 nan
0.6572 1.6014 6000 nan
0.8024 1.8016 6750 nan

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support