See axolotl config
axolotl version: 0.9.2
base_model: mlabonne/gemma-3-27b-it-abliterated-v2
base_model_config: mlabonne/gemma-3-27b-it-abliterated-v2
model_type: Gemma3ForCausalLM
tokenizer_type: AutoTokenizer
load_in_4bit: true
bnb_4bit_compute_dtype: bfloat16
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: true
strict: false
datasets:
# - path: data/rp_below_50_train_a_1.parquet
# type: chat_template
# field_messages: messages
# split: train
# - path: data/rp_below_50_train_a_2.parquet
# type: chat_template
# field_messages: messages
# split: train
# - path: data/rp_below_50_train_a_3.parquet
# type: chat_template
# field_messages: messages
# split: train
# - path: data/rp_below_50_train_a_4.parquet
# type: chat_template
# field_messages: messages
# split: train
- path: data/train_chats.parquet
type: chat_template
field_messages: messages
split: train
# - path: data/bdsm_train.parquet
# type: chat_template
# field_messages: messages
# split: train
val_set_size: 0.0 # or use one of your test datasets with 'validation' split
test_datasets:
- path: data/val_chats.parquet
type: chat_template
field_messages: messages
split: train
# - path: data/bdsm_test.parquet
# type: chat_template
# field_messages: messages
# split: train
# - path: data/rp_below_50_test.parquet
# type: chat_template
# field_messages: messages
# split: train
chat_template: chatml
#val_set_size: 0.05 # This will now be ignored if you have a separate 'validation' split defined above
save_safetensors: true
output_dir: ./outputs/libry_j_ai # <-- OPTIONAL: Consider renaming for clarity with the new model
sequence_len: 4096 # Keep this for now, but be mindful of memory for 27B without Flash Attention
wandb_project: sdg-finetune
wandb_watch: all
wandb_name: Gemma3ForCausalLM
# Batch & Training Config
micro_batch_size: 2
eval_batch_size: 2 # Consistent with micro_batch_size
gradient_accumulation_steps: 4
num_epochs: 2
evals_per_epoch: 5
eval_sample_packing: true # <-- OPTIONAL: Changed back to true for efficiency and consistency
eval_table_size: 0
# Optimizer
optimizer: paged_adamw_32bit # DeepSpeed will manage the optimizer with its settings
adam_beta2: 0.95
adam_epsilon: 0.00001
lr_scheduler: cosine
learning_rate: 1e-4
weight_decay: 0.1
warmup_ratio: 0.03
gradient_clipping: 1.0
group_by_length: true
# Memory & Precision
train_on_inputs: true
train_on_eos: turn
bf16: true
fp16: false
tf32: false
sample_packing: false
pad_to_sequence_len: true # Recommended with sample_packing
roles_to_train: ["assistant", "model", "user", "system"]
gradient_checkpointing: true
save_steps: 500
#gradient_checkpointing_kwargs:
# use_reentrant: false
#resume_from_checkpoint: ./outputs/libry_run2/checkpoint-379 # <-- CHANGE THIS: Set to 'true' to continue from the last checkpoint
#auto_resume_from_checkpoint: false
logging_steps: 1
flash_attention: false # DISABLED Flash Attention
flash_attention_v2: false # DISABLED Flash Attention V2
attn_implementation: eager # Set attention implementation back to eager
# LoRA Settings
adapter: lora
lora_r: 32
lora_alpha: 64
lora_dropout: 0.05
lora_target_modules: ["q_proj", "v_proj", "o_proj"]
lora_bias: none
lora_task_type: CAUSAL_LM
# DeepSpeed Configuration (Optimized)
deepspeed: deepspeed_configs/zero2_cpu_offload.json # Point to your DeepSpeed config file
# Hugging Face Upload (Disabled)
push_to_hub: false
special_tokens:
eos_token: "<eos>"
bos_token: "<bos>"
unk_token: "<unk>"
pad_token: "<pad>"
outputs/libry_j_ai
This model is a fine-tuned version of mlabonne/gemma-3-27b-it-abliterated-v2 on the data/train_chats.parquet dataset. It achieves the following results on the evaluation set:
- Loss: nan
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Use paged_adamw_32bit with betas=(0.9,0.95) and epsilon=1e-05 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 224
- num_epochs: 2.0
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 2.8256 | 0.0003 | 1 | nan |
| 1.9027 | 0.2002 | 750 | nan |
| 2.0386 | 0.4004 | 1500 | nan |
| 1.5538 | 0.6006 | 2250 | nan |
| 2.107 | 0.8008 | 3000 | nan |
| 0.8711 | 1.0008 | 3750 | nan |
| 0.7524 | 1.2010 | 4500 | nan |
| 0.8172 | 1.4012 | 5250 | nan |
| 0.6572 | 1.6014 | 6000 | nan |
| 0.8024 | 1.8016 | 6750 | nan |
Framework versions
- PEFT 0.15.2
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.1
- Tokenizers 0.21.1
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support