Commit History

modeling: enable HF gradient_checkpointing — declare attribute on DeepseekV4Model + DeepseekV4ForCausalLM and wrap layer iteration in self._gradient_checkpointing_func when enabled+training
fc8b993
verified

kshitijthakkar commited on

Attach tokenizer + chat template + code package + model card
03bb098
verified

kshitijthakkar commited on