--- base_model: google/gemma-4-e2b-it library_name: transformers pipeline_tag: image-text-to-text --- # gemma4-e2b-webvid4K_FT Full fine-tune of `google/gemma-4-e2b-it` on AI-generated video data derived from WebVid. ## Training - Dataset: `bear7011/gemma-4-e4b-webvid-4K` - Samples: 3,941 video instruction examples - Method: full fine-tuning, no LoRA - Precision: bfloat16 - GPUs: 4 - DeepSpeed: ZeRO-3 with CPU optimizer and parameter offload - Epochs: 1 - Global steps: 124 - Per-device batch size: 1 - Gradient accumulation steps: 8 - Optimizer: AdamW - Learning rate: 5e-6 - Projector learning rate: 5e-6 - Image encoder learning rate: 0.0 - Weight decay: 0.01 - Warmup ratio: 0.03 - LR scheduler: cosine - Gradient checkpointing: enabled - Max sequence length: 2304 - Final training loss: 1.9510 Checkpoints and training logs are not included in this repository.