Do y'all plan to release the 200B Pro Model too?

#5
by evewashere - opened

That would shake the AI industry to its core.

to train a transformer, it takes 6PT, assuming this follows this rule, and each patch (16x16 pixels) = 1 token, there would be 16,384 tokens/input image during training. Now, to train a model, you typically need 20 examples / 1 parameter (exceptions like classifiers). 200,000,000,000 * 16384 / example = 3.2768e+15 examples.
3,276,800,000,000,000 tokens. The compute would be 6200,000,000,0003,276,800,000,000,000

3,932,160,000,000,000,000,000,000,000 flops...
3,932,160,000,000,000,000,000,000kflops
3,932,160,000,000,000,000,000mflops
3,932,160,000,000,000,000gflops
3,932,160,000,000,000tflops
3,932,160,000,000pflops
3,932,160,000eflops
3,932,160zflops...

Oracles gpu cluster (65,000 H200s) is only 260 EFLOPS...
20164923 seconds to train (assuming 75% util)
233 days.
It would cost ~1.8B to rent ($5/hr/gpu)

And to prove it's a transformer.

"HiDream-O1-Image is a natively unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) without external VAEs or disjoint text encoders, which natively encodes raw pixels, text, and task-specific conditions in a single shared token space β€” supporting text-to-image, image editing, and subject-driven personalization at up to 2,048 Γ— 2,048."

Correct me if I'm wrong.

nice girl math x wall of text, but in the readme they clearly mention the 200B model and benchmarks for it.

i skimmed the thing... i js looked at it very briefly. and also, if they did have a 200b model, why would they open release it... thats prob to make hype for it when they release a paid version of it.

if they did have a 200b model, why would they open release it

because communism
image

Sign up or log in to comment