How much ram and VRAM did this take?

Thank you. It was because of all the time to troubleshoot the enviornment and complexity of the unique techniques post training to laydown the alignment the way they did. Ill share a paper if someone wants to repeat. It draft quality and I didnt share the last day of effort but you will get enough to advance to the abation runs. Ill share in the org ... I will compress back down to FP8 maybe this week if there is time.

EclipseMist

12 days ago

Thank you. It was because of all the time to troubleshoot the enviornment and complexity of the unique techniques post training to laydown the alignment the way they did. Ill share a paper if someone wants to repeat. It draft quality and I didnt share the last day of effort but you will get enough to advance to the abation runs. Ill share in the org ... I will compress back down to FP8 maybe this week if there is time.

That sounds great! Wish there was an efficient way to fine tune with lora though, I think minimax would be a great model to fine tune if it was not so incredibly resource intensive.

trohrbaugh

RadicalNotion.AI org 12 days ago

They trained in FP8 so upscalling by the three I found would make it possible. only one will work which is the one I selected but required three patches for Transformer 5.5 After that and as it is above you could do it but yes massively slow with total weight tensors of 48, 234... 256 experts per layer. By the end I could do it on 8x 6000 which makes it more reasonable from a cost point of view if you are leasing. Its just very very slow using pcie only between cards. It will be interesting to see what they have cooked up under the hood for 2.7

EclipseMist

12 days ago

They trained in FP8 so upscalling by the three I found would make it possible. only one will work which is the one I selected but required three patches for Transformer 5.5 After that and as it is above you could do it but yes massively slow with total weight tensors of 48, 234... 256 experts per layer. By the end I could do it on 8x 6000 which makes it more reasonable from a cost point of view if you are leasing. Its just very very slow using pcie only between cards. It will be interesting to see what they have cooked up under the hood for 2.7

I've been pretty exited to test 2.7 locally since its benchmarks look pretty promising. If its good enough id gladly switch from my expensive claude max subscription to use it locally so fingers crossed. Wont be opus but if I can get 80% of opus's intelligence and coding skills I would be more than happy.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment