Training setup for GLM-4.7-355B LoRA — framework, GPU config, and MoE gotchas?

by akprocks - opened Feb 25

•

I came across your GLM-4.7-Architect-355B-A32B-LoRA model on HuggingFace — great work, and really useful to see that LoRA on the full 355B is achievable!

I'm building a specific set of LoRAs on GLM-4.7 and your model is the only public evidence I've found of someone actually training a LoRA adapter on it. Before I start, I wanted to ask a few quick questions if you don't mind:

How many GPUs and what GPU type did you train on?
Roughly how long did training take?
Did you hit any architecture-specific issues with the MoE layers or the MLA attention — anything that needed a workaround?

Any detail you're able to share would be hugely helpful.

Thanks in advance!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment