HF-compatible adaptation of Hiera (from SAM2.1), ready for use as a visual encoder in multimodal language models with Trainer and DeepSpeed support