Gemma-3-270m-IT-DPO (Weight Delta Analysis)
This model is a fine-tuned version of Gemma-3-270m using Direct Preference Optimization (DPO). Beyond standard alignment, this repository explores structural weight analysis through mask generation from weight deltas.
π Technical Features
1. Weight Delta Masking
The repository includes tools to build binary masks from weight delta logs to identify the most significant parameter changes during DPO.
- Methods Supported:
Magnitude,Momentum, andFisher. - Comparison Logic: Includes a Jaccard/IoU (Intersection over Union) method to compare generated masks against the default magnitude mask.
- Score 0: No similarity.
- Score 1: Perfect similarity.
- Storage: All generated masks are saved as
.ptfiles in the/masksdirectory.
2. Optimized Training Kernels
To ensure maximum efficiency on high-end compute, the training environment utilizes:
- BSR-AdamW Kernel: A specialized Triton-based optimizer kernel for DPO.
- Hardware Compatibility: Verified for NVIDIA H100/H200 GPUs.
- Triton Validation: Environment readiness can be tested with a short run (50-100 steps), typically taking only a few minutes on H-series hardware.
π Repository Structure
/final_checkpoint: The weights of the DPO-tuned model./masks: Contains.ptmask files generated using the methods mentioned above.
π Reproduction & Debugging
If you are running the mask generation or training scripts:
- Jaccard Flag: If the Jaccard/IoU comparison breaks during execution, it is recommended to disable the
--debug/jaccardflag temporarily. A fix is scheduled for the upcoming weekend. - Environment Check: Ensure
tritonis properly installed to handle the BSR-AdamW kernel.
π Usage Note
This model is part of an ongoing research project into how DPO shifts model weights. Results from the Jaccard similarity analysis can be used to interpret which parameters are most "critical" for preference alignment.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for YiyingXie/gemma-3-270m-it-dpo
Base model
google/gemma-3-270m