kshitijthakkar's picture
modeling: enable HF gradient_checkpointing — declare attribute on DeepseekV4Model + DeepseekV4ForCausalLM and wrap layer iteration in self._gradient_checkpointing_func when enabled+training
fc8b993 verified