Gemma-3-270m-IT-DPO (Weight Delta Analysis)

This model is a fine-tuned version of Gemma-3-270m using Direct Preference Optimization (DPO). Beyond standard alignment, this repository explores structural weight analysis through mask generation from weight deltas.

🛠 Technical Features

1. Weight Delta Masking

The repository includes tools to build binary masks from weight delta logs to identify the most significant parameter changes during DPO.

Methods Supported: Magnitude, Momentum, and Fisher.
Comparison Logic: Includes a Jaccard/IoU (Intersection over Union) method to compare generated masks against the default magnitude mask.
- Score 0: No similarity.
- Score 1: Perfect similarity.
Storage: All generated masks are saved as .pt files in the /masks directory.

2. Optimized Training Kernels

To ensure maximum efficiency on high-end compute, the training environment utilizes:

BSR-AdamW Kernel: A specialized Triton-based optimizer kernel for DPO.
Hardware Compatibility: Verified for NVIDIA H100/H200 GPUs.
Triton Validation: Environment readiness can be tested with a short run (50-100 steps), typically taking only a few minutes on H-series hardware.

📂 Repository Structure

/final_checkpoint: The weights of the DPO-tuned model.
/masks: Contains .pt mask files generated using the methods mentioned above.

🚀 Reproduction & Debugging

If you are running the mask generation or training scripts:

Jaccard Flag: If the Jaccard/IoU comparison breaks during execution, it is recommended to disable the --debug/jaccard flag temporarily. A fix is scheduled for the upcoming weekend.
Environment Check: Ensure triton is properly installed to handle the BSR-AdamW kernel.

📝 Usage Note

This model is part of an ongoing research project into how DPO shifts model weights. Results from the Jaccard similarity analysis can be used to interpret which parameters are most "critical" for preference alignment.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YiyingXie/gemma-3-270m-it-dpo

Base model

google/gemma-3-270m

Finetuned

(135)

this model