Training setup for GLM-4.7-355B LoRA β€” framework, GPU config, and MoE gotchas?

#1
by akprocks - opened

Hi

I came across your GLM-4.7-Architect-355B-A32B-LoRA model on HuggingFace β€” great work, and really useful to see that LoRA on the full 355B is achievable!

I'm building a specific set of LoRAs on GLM-4.7 and your model is the only public evidence I've found of someone actually training a LoRA adapter on it. Before I start, I wanted to ask a few quick questions if you don't mind:

  1. How many GPUs and what GPU type did you train on?
  2. Roughly how long did training take?
  3. Did you hit any architecture-specific issues with the MoE layers or the MLA attention β€” anything that needed a workaround?

Any detail you're able to share would be hugely helpful.

Thanks in advance!

Sign up or log in to comment