Training setup for GLM-4.7-355B LoRA β framework, GPU config, and MoE gotchas?
#1
by akprocks - opened
Hi
I came across your GLM-4.7-Architect-355B-A32B-LoRA model on HuggingFace β great work, and really useful to see that LoRA on the full 355B is achievable!
I'm building a specific set of LoRAs on GLM-4.7 and your model is the only public evidence I've found of someone actually training a LoRA adapter on it. Before I start, I wanted to ask a few quick questions if you don't mind:
- How many GPUs and what GPU type did you train on?
- Roughly how long did training take?
- Did you hit any architecture-specific issues with the MoE layers or the MLA attention β anything that needed a workaround?
Any detail you're able to share would be hugely helpful.
Thanks in advance!